A Unity package for SentencePiece tokenization.
This package contains the .dll for Microsoft.ML.Tokenizer which is part of ML.NET as well as the following dependencies:
- Google.Protobuf
- Microsoft.Bcl.AsyncInterfaces
- System.Runtime.CompilerServices.Unsafe
- System.Text.Encodings.Web
- System.Text.Json
The main use I have for this is to implement specific tokenizers that rely on SentencePiece (like LLama, T5, ...) as part of the com.doji.transformers package.