/com.doji.sentencepiece

A Unity package for SentencePiece tokenization

Primary LanguageC#MIT LicenseMIT

doji logo

SentencePiece

OpenUPM

A Unity package for SentencePiece tokenization.

About

This package contains the .dll for Microsoft.ML.Tokenizer which is part of ML.NET as well as the following dependencies:

  • Google.Protobuf
  • Microsoft.Bcl.AsyncInterfaces
  • System.Runtime.CompilerServices.Unsafe
  • System.Text.Encodings.Web
  • System.Text.Json

The main use I have for this is to implement specific tokenizers that rely on SentencePiece (like LLama, T5, ...) as part of the com.doji.transformers package.