Download the PHP package crazzy501/gpt3-tokenizer without Composer

On this page you can find all versions of the php package crazzy501/gpt3-tokenizer. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package gpt3-tokenizer

GPT3Tokenizer for PHP

This is a PHP port of the GPT-3 tokenizer. It is based on the original Python implementation and the Nodejs implementation.

GPT-2 and GPT-3 use a technique called byte pair encoding to convert text into a sequence of integers, which are then used as input for the model. When you interact with the OpenAI API, you may find it useful to calculate the amount of tokens in a given text before sending it to the API.

If you want to learn more, read the Summary of the tokenizers from Hugging Face.

Support ⭐️

If you find my work useful, I would be thrilled if you could show your support by giving this project a star ⭐️. It only takes a second and it would mean a lot to me. Your star will not only make me feel warm and fuzzy inside, but it will also help reach more people who can benefit from this project.

Installation

Install the package from Packagist using Composer:

Testing

Loading the vocabulary files consumes a lot of memory. You might need to increase the phpunit memory limit. https://stackoverflow.com/questions/46448294/phpunit-coverage-allowed-memory-size-of-536870912-bytes-exhausted

Use the configuration Class

A note on caching

The tokenizer will try to use apcu for caching, if that is not available it will use a plain PHP array. You will see slightly better performance for long texts when using the cache. The cache is enabled by default.

Encode a text

Decode a text

Count the number of tokens in a text

Encode a given text into chunks of tokens, with each chunk containing a specified maximum number of tokens.

This method is useful when handling large texts that need to be divided into smaller chunks for further processing.

Takes a given text and chunks it into encoded segments, with each segment containing a specified maximum number of tokens.

This method leverages the encodeInChunks method for encoding the text into Byte-Pair Encoded (BPE) tokens and then decodes these tokens back into text.

License

This project uses the Apache License 2.0 license. See the LICENSE file for more information.


All versions of gpt3-tokenizer with dependencies

PHP Build Version
Package Version
Requires php Version ^8.2
ext-mbstring Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package crazzy501/gpt3-tokenizer contains the following files

Loading the files please wait ....