If the model hasn't tokenizer.json file, what should I do?
Closed this issue · 2 comments
wolf-li commented
Not all model in huggingface hub has tokenizer.json file such like Marian model.
'tokenizer_config.json',
'special_tokens_map.json',
'vocab.json',
'source.spm',
'target.spm',
'added_tokens.json' too much files. What should I do?
FFengIll commented
vocab.json can be used to load and parse into tokenizer info.
tqchen commented
Seems one common approach so far is to convert the other tokenizer format into HF's tokenizer.json format