Due to the lack of a C# version of cl100k_base
encoding (gpt-3.5-turbo), I have implemented a basic solution with encoding and decoding methods based on the official Rust implementation.
Currently, cl100k_base
p50k_base
has been implemented. Other encodings will be added in future submissions. If you encounter any issues or have questions, please feel free to submit them on the lssues
."
- GetEncodingSetting now supports the model of gpt-4 and also allows for encoding names to be directly passed in.
- add a method TikToken.PBEFileDirectory to allow for custom storage directory of bpe files. the path needs to be set before TikToken.EncodingForModel().
- p50k_base encoding algorithm that supports the text-davinci-003 model.
using TiktokenSharp;
TikToken tikToken = TikToken.EncodingForModel("gpt-3.5-turbo");
var i = tikToken.Encode("hello world"); //[15339, 1917]
var d = tikToken.Decode(i); //hello world