
Tokenizer vs SentencePiece: Implementation Similarity and Converting sentencepiece.model to JSON

Closed this issue · 0 comments

tylike commented

Is the implementation of tokenizer the same as Google's SentencePiece?
For example, will the same input have the same output when calling encode?
If so, how can I convert sentencepiece.model file to a json file?

Thank you.