converter.py Attribute error error

Question

converter.py Attribute error error

vr25 opened this issue 5 years ago · 4 comments

Running converter.py (after preprocessing and saving the data into .tsv files), I get the following error:

File "BERT_train.py", line 74, in
train_features = list(tqdm(p.imap(convert_examples_to_features.convert_example_to_feature, train_examples_for_processing), total=train_examples_len))
File "/gpfs/u/home/HPDM/HPDMrawt/scratch/miniconda3/envs/vdr/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1060, in iter
for obj in iterable:
File "/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
AttributeError: 'NoneType' object has no attribute 'tokenize'
0%| | 0/560000 [00:00<?, ?it/s]

Answer 1 · 2019-09-24T10:29:40.000Z

I'm going to need a little more information about what you were doing when you got the error.

Answer 2 · 2019-09-24T10:51:14.000Z

I'm going to need a little more information about what you were doing when you got the error.

I just updated my previous comment, thanks.

Answer 3 · 2019-09-24T12:03:45.000Z

It looks like the tokenizer is not being loaded properly. Try printing out the tokenizer and see whether it's loaded in the converter.py file. It's being loaded here.

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)

Also, this repo and the pytorch_pretrained_bert library it is built on has been updated. I recommend using the newer version instead.

Answer 4 · 2019-09-24T17:43:16.000Z

Correct, thanks, I solved it by editing line 56 in converter.py:
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
to
tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=False)