mlc-ai/tokenizers-cpp

If the model hasn't tokenizer.json file, what should I do?

Closed this issue · 2 comments

Not all model in huggingface hub has tokenizer.json file such like Marian model.
'tokenizer_config.json',
'special_tokens_map.json',
'vocab.json',
'source.spm',
'target.spm',
'added_tokens.json' too much files. What should I do?

vocab.json can be used to load and parse into tokenizer info.

Seems one common approach so far is to convert the other tokenizer format into HF's tokenizer.json format