Can not load SentencePiece model
dcferreira opened this issue · 1 comments
dcferreira commented
I'm struggling with loading a sentencepiece model, and the error message is a bit cryptic so I'm not sure where to go next.
The error I get is the following:
2020-01-31 12:07:45.420864: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at sentencepiece_kernels.cc:211 : Internal: external/com_google_sentencepiece/src/sentencepiece_processor.cc(73) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
Traceback (most recent call last):
File "load.py", line 4, in <module>
tokenizer = tensorflow_text.SentencepieceTokenizer('model.model')
File "/home/dferreira/projects/porn_classifier_tf2/venv/lib/python3.7/site-packages/tensorflow_text/python/ops/sentencepiece_tokenizer.py", line 79, in __init__
model=model)
File "<string>", line 51, in sentencepiece_op
File "<string>", line 125, in sentencepiece_op_eager_fallback
File "/home/dferreira/projects/porn_classifier_tf2/venv/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: external/com_google_sentencepiece/src/sentencepiece_processor.cc(73) [model_proto->ParseFromArray(serialized.data(), serialized.size())] [Op:SentencepieceOp]
I'm using Python 3.7.6 with:
tensorflow==2.1.0
tensorflow-text==2.1.0rc0
sentencepiece==0.1.85
The following is a minimal reproducible example:
- Create a file
raw_text
with the content:
This is a raw text file.
With 2 lines.
- Create
train.py
with the content:
import sentencepiece
sentencepiece.SentencePieceTrainer.Train('--input=raw_text --vocab_size=20 --model_prefix=model')
- Run
python train.py
. You will get amodel.model
andmodel.vocab
. - Create
load.py
with the content:
import tensorflow_text
tokenizer = tensorflow_text.SentencepieceTokenizer('model.model')
- Run
python load.py
and you will get the error above.
It should be noted that loading the same model via sentencepiece.SentencePieceProcessor.Load
works.
Like I said, I wasn't really able to interpret the error message.
How can I make this work?
thuang513 commented
The input is a serialized string containing the model (not the model file path). See [1] for an example of how to load the model file.
[1]