converter.py Attribute error error
vr25 opened this issue · 4 comments
Running converter.py (after preprocessing and saving the data into .tsv files), I get the following error:
File "BERT_train.py", line 74, in
train_features = list(tqdm(p.imap(convert_examples_to_features.convert_example_to_feature, train_examples_for_processing), total=train_examples_len))
File "/gpfs/u/home/HPDM/HPDMrawt/scratch/miniconda3/envs/vdr/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1060, in iter
for obj in iterable:
File "/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
AttributeError: 'NoneType' object has no attribute 'tokenize'
0%| | 0/560000 [00:00<?, ?it/s]
I'm going to need a little more information about what you were doing when you got the error.
I'm going to need a little more information about what you were doing when you got the error.
I just updated my previous comment, thanks.
It looks like the tokenizer
is not being loaded properly. Try printing out the tokenizer and see whether it's loaded in the converter.py
file. It's being loaded here.
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
Also, this repo and the pytorch_pretrained_bert
library it is built on has been updated. I recommend using the newer version instead.
Correct, thanks, I solved it by editing line 56 in converter.py:
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
to
tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=False)