Download the PHP package everstu/gpt3-tokenizer without Composer
On this page you can find all versions of the php package everstu/gpt3-tokenizer. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package gpt3-tokenizer
GPT3Tokenizer for PHP
This package is forked from https://packagist.org/packages/gioni06/gpt3-tokenizer and changed to support php 7, thanks to the original author!
This is a PHP port of the GPT-3 tokenizer. It is based on the original Python implementation and the Nodejs implementation.
GPT-2 and GPT-3 use a technique called byte pair encoding to convert text into a sequence of integers, which are then used as input for the model. When you interact with the OpenAI API, you may find it useful to calculate the amount of tokens in a given text before sending it to the API.
If you want to learn more, read the Summary of the tokenizers from Hugging Face.
Support ⭐️
If you find my work useful, I would be thrilled if you could show your support by giving this project a star ⭐️. It only takes a second and it would mean a lot to me. Your star will not only make me feel warm and fuzzy inside, but it will also help reach more people who can benefit from this project.
Installation
Install the package from Packagist using Composer:
Testing
Loading the vocabulary files consumes a lot of memory. You might need to increase the phpunit memory limit. https://stackoverflow.com/questions/46448294/phpunit-coverage-allowed-memory-size-of-536870912-bytes-exhausted
Use the configuration Class
A note on caching
The tokenizer will try to use apcu
for caching, if that is not available it will use a plain PHP array
.
You will see slightly better performance for long texts when using the cache. The cache is enabled by default.
Encode a text
Decode a text
Count the number of tokens in a text
License
This project uses the Apache License 2.0 license. See the LICENSE file for more information.
All versions of gpt3-tokenizer with dependencies
ext-mbstring Version *
ext-json Version *