ThilinaRajapakse/BERT_binary_text_classification

converter.py Attribute error error

vr25 opened this issue · 4 comments

vr25 commented

Running converter.py (after preprocessing and saving the data into .tsv files), I get the following error:

File "BERT_train.py", line 74, in
train_features = list(tqdm(p.imap(convert_examples_to_features.convert_example_to_feature, train_examples_for_processing), total=train_examples_len))
File "/gpfs/u/home/HPDM/HPDMrawt/scratch/miniconda3/envs/vdr/lib/python3.6/site-packages/tqdm/_tqdm.py", line 1060, in iter
for obj in iterable:
File "/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
AttributeError: 'NoneType' object has no attribute 'tokenize'
0%| | 0/560000 [00:00<?, ?it/s]

I'm going to need a little more information about what you were doing when you got the error.

vr25 commented

I'm going to need a little more information about what you were doing when you got the error.

I just updated my previous comment, thanks.

It looks like the tokenizer is not being loaded properly. Try printing out the tokenizer and see whether it's loaded in the converter.py file. It's being loaded here.

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)

Also, this repo and the pytorch_pretrained_bert library it is built on has been updated. I recommend using the newer version instead.

vr25 commented

Correct, thanks, I solved it by editing line 56 in converter.py:
tokenizer = BertTokenizer.from_pretrained('bert-base-cased', do_lower_case=False)
to
tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=False)