Download the PHP package envoymediagroup/columna without Composer

On this page you can find all versions of the php package envoymediagroup/columna. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package columna

Columnar Analytics (in pure PHP)

On GitHub: https://github.com/envoymediagroup/columna

About the project

What does it do?

This library allows you to write and read a simple columnar file format in a performant way with a lightweight, pure PHP implementation.

Why columnar analytics in PHP?

This library started as a scratch-our-own-itch project at Envoy Media Group. We needed fast, columnar analytics that would work well with our all-PHP stack, but found PHP's support and performance for mainstream columnar formats (Parquet, ORC, etc.) to be lacking. So we rolled our own simple columnar format with its own speedy writer and reader.

How battle tested is it?

This library has been in production use as the backbone of Envoy's analytics and business intelligence since early 2022. It processes hundreds of thousands of reads and writes per day, serving both custom reports for business users and automated requests for monitoring and machine learning applications. Bug fixes, feature adds, and improvements are ongoing based on our experience using this library every day in production.

Installation

Add this library to your project using Composer:

File format

What file format does this library use to store data? The file extension .scf is for Simple Columnar Format, and it is simple: all the metadata about the file, its columns, and their definitions and offsets are stored on line 1 in a JSON header. The rest of the record is CSV-like data in a columnar arrangement (each column corresponding to one line in the file) using RLE compression and a Record Separator character as the RLE delimiter. There is some extra escaping applied to the strings to increase the range of valid values that can be stored and retrieved. See a sample file here.

Usage

Writer

Each columnar file is specific to one date and one metric, with any number of dimensions. For this example, we will assume a metric named clicks and three dimensions named platform_id, site_id, and url. Note that we provide the headers and values as separate inputs to the Writer; this makes sense when we are working with large data sets and want to preserve some memory by not duplicating associative string keys on every array item.

Data Types

Currently supported data types include strings, ints, floats, and bools, and a special "datetime" type. Datetimes are treated as strings except when evaluating query conditions, when they are parsed with strtotime() and compared with integer operations >, <, =, etc. Nested data is not currently supported. While it is possible to store JSON or other serializations in the string type, these values will not be unserialized by the engine and so cannot be evaluated for nested values. The column definitions include an empty value which will always be used in place of nulls in the data set, so null is never stored in the files or returned when reading a file.

Usage

Let's walk through using the Writer in the comments below:

Now we have a complete file at $file_path.

CombinedWriter

The regular Writer allows you to take a row-based data set and transform it into a columnar file. The CombinedWriter then allows you to take multiple existing columnar files and combine them into a new columnar file containing all the data from the provided files. This only works if the files you provide are all for the same metric, on the same date, with the same columns. You can use this to distribute the work of generating data sets and files across a large number of workers, and then use another worker to combine those results into a single large file containing all the data for that metric on that date. You can use it like so:

We now have a file at $combined_file_path with all the data in it from the array of $partial_files we collected.

Reader

Here's how to read a file. Note that this library contains both Reader and BundledReader classes. They both do the same thing and you can use them interchangeably, but you will see a slight performance win by using the BundledReader because it reduces the number of include()s PHP has to perform. It's a small win that can add up at scale.

Call with arguments, get array results

To call the Reader normally with arguments:

Call with JSON string workload, get JSON+CSV string results

The Reader is designed for easy use when running a large number of requests distributed over many worker processes using an RPC or messaging framework such as AWS SQS, RabbitMQ, or our own envoymediagroup/lib-rpc. For this reason, the Reader can accept a string as its input and return a string as its output. The request string is a JSON serialization of the Reader arguments. For the result string, the first line is the metadata of the response encoded as JSON, and the following lines are the result data encoded as CSV with a bit of extra escaping for more safety in encoding/decoding strings. The Response class will handle unserializing this string for you. Be sure to use this Response class to parse results, as it will handle unescaping those strings properly.

An example caller:

An example worker:

Metadata

Metadata looks like this:

Results

Result data set looks like this. Note that you can reference the 'index' field in the 'column_meta' of the metadata to map the indexes in each record to the appropriate column names.

Q&A

Why didn't you use library X, built-in function Y, or design pattern Z?

The short answer is performance. I kept the requirements of this library as small as possible to make the autoload very lightweight and reduce time spent include()ing files, which adds up quickly when you are optimizing for every millisecond. Many of PHP's built-in array functions actually run slower than foreaching the same array. Design patterns with more abstraction mean more classes and more weight. Keeping it simple keeps it fast.

Issues, Feature Requests

See the open issues for a full list of known issues or to submit an issue or feature request.

Of course, if you spot any egregious bugs or security holes, please create an issue and notify me right away (contact info below).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Copy .env.base to .env (required) and update any environment variables (optional)
  3. Run docker-compose up
  4. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  5. Make changes
  6. Run docker exec -it columna composer run test to make sure the unit tests pass
  7. Run docker exec -it columna composer run bundle to create a new BundledReader.php
  8. Commit your Changes (git commit -m 'Add some AmazingFeature')
  9. Push to the Branch (git push origin feature/AmazingFeature)
  10. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Creator: Ryan Marlow

Twitter:@myanrarlow

Email: [email protected]

Acknowledgments

Here are some resources I've found helpful for this project.


All versions of columna with dependencies

PHP Build Version
Package Version
Requires php Version ^7.3 || ^8.0
ext-json Version *
ext-mbstring Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package envoymediagroup/columna contains the following files

Loading the files please wait ....