supersymmetry-technologies/BBT-FinCUGE-Applications

Error Loading 1B model from hugging face

Opened this issue · 1 comments

KaiWU5 commented

When i try to load BBT-1-1B the tokenizer gives me TypeError: not a string.
After debugging, i found the vocab.txt is not valid for loading.
If i dont set vocab.txt in the model dir to T5Tokenizer.from_pretrained, the error is TypeError: not a string.
If i set vocab.txt in the model dir, the error changes to RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())].

However, for BBT-2-12B-Text model, there is spiece.model spiece.vocab for tokenizer.

Providingspiece.model spiece.vocab or show an example of using vocab.txt would be very helpful!

当我尝试加载分词器时,给了我.调试后,我发现vocab.txt无法加载。如果我没有在模型目录中设置为 ,错误是 .如果我在模型目录中设置,则错误将更改为 .BBT-1-1B``TypeError: not a string``vocab.txt``T5Tokenizer.from_pretrained``TypeError: not a string``vocab.txt``RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

但是,对于模型,有分词器。BBT-2-12B-Text``spiece.model spiece.vocab

提供或展示使用示例将非常有帮助!spiece.model spiece.vocab``vocab.txt

I encountered the same problem, how did you solve it? Thanks