Error Loading 1B model from hugging face

Question

Error Loading 1B model from hugging face

Opened this issue a year ago · 1 comments

When i try to load BBT-1-1B the tokenizer gives me TypeError: not a string.
After debugging, i found the vocab.txt is not valid for loading.
If i dont set vocab.txt in the model dir to T5Tokenizer.from_pretrained, the error is TypeError: not a string.
If i set vocab.txt in the model dir, the error changes to RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())].

However, for BBT-2-12B-Text model, there is spiece.model spiece.vocab for tokenizer.

Providingspiece.model spiece.vocab or show an example of using vocab.txt would be very helpful!

Answer 1 · 2024-03-12T13:50:24.000Z

当我尝试加载分词器时，给了我.调试后，我发现vocab.txt无法加载。如果我没有在模型目录中设置为，错误是 .如果我在模型目录中设置，则错误将更改为 .BBT-1-1B``TypeError: not a string``vocab.txt``T5Tokenizer.from_pretrained``TypeError: not a string``vocab.txt``RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

但是，对于模型，有分词器。BBT-2-12B-Text``spiece.model spiece.vocab

提供或展示使用示例将非常有帮助！spiece.model spiece.vocab``vocab.txt

I encountered the same problem, how did you solve it? Thanks