If the model hasn't tokenizer.json file, what should I do?

Question

If the model hasn't tokenizer.json file, what should I do?

Closed this issue 9 months ago · 2 comments

Not all model in huggingface hub has tokenizer.json file such like Marian model.
'tokenizer_config.json',
'special_tokens_map.json',
'vocab.json',
'source.spm',
'target.spm',
'added_tokens.json' too much files. What should I do?

Answer 1 · 2023-09-21T07:52:46.000Z

vocab.json can be used to load and parse into tokenizer info.

Answer 2 · 2024-04-04T11:41:41.000Z

Seems one common approach so far is to convert the other tokenizer format into HF's tokenizer.json format