shangjingbo1226/AutoNER

_pickle.UnpicklingError: pickle data was truncated ---on bio_embedding.pk

SeekPoint opened this issue · 3 comments

mldl@ub1604:~/ub16_prj/AutoNER$ md5sum models/BC5CDR/bio_embedding.pk
dd549629b7ea9cf97d7df62cd16c0e9f models/BC5CDR/bio_embedding.pk

mldl@ub1604:/ub16_prj/AutoNER$ python3.6 preprocess_partial_ner/encode_folder.py --input_train models/BC5CDR/annotations.ck --input_testa data/BC5CDR/truth_dev.ck --input_testb data/BC5CDR/truth_test.ck --pre_word_emb models/BC5CDR/embedding.pk --output_folder models/BC5CDR/encoded_data
args.pre_word_emb is models/BC5CDR/embedding.pk
Traceback (most recent call last):
File "preprocess_partial_ner/encode_folder.py", line 263, in
w_emb = pickle.load(f)
_pickle.UnpicklingError: pickle data was truncated
mldl@ub1604:
/ub16_prj/AutoNER$

It seems to me, the filename is wrong. It should be models/BC5CDR/embedding.pk, instead of models/BC5CDR/bio_embedding.pk.

Note that, in the Traceback information, you can see that args.pre_word_emb is models/BC5CDR/embedding.pk

The MD5 sum looks good to me.

~/AutoNER$ md5sum models/BC5CDR/embedding.pk
dd549629b7ea9cf97d7df62cd16c0e9f  models/BC5CDR/embedding.pk

sure, I made stupid mistake

mldl@ub1604:~/ub16_prj/AutoNER$ python3.6 preprocess_partial_ner/encode_folder.py --input_train models/BC5CDR/annotations.ck --input_testa data/BC5CDR/truth_dev.ck --input_testb data/BC5CDR/truth_test.ck --pre_word_emb models/BC5CDR/embedding.pk --output_folder models/BC5CDR/encoded_data
args.pre_word_emb is models/BC5CDR/embedding.pk
Killed

looks still OOM