kyzhouhzau/BERT-NER

test.txt and label_test.txt isn't same in line numbers

sEhsanTaher opened this issue · 3 comments

hi
I use the code (thanks for that!)
but there is a problem when test prediction writes in the "output/result_dir/label_test.txt" I thought that this file must be the same as "data/test.txt" but it isn't!

I know that this (Bert-ner) library removes empty new lines in "output/result_dir/label_test.txt" but with removing empty new lines in "data/test.txt" the problem still exists.
(number of lines in "output/result_dir/label_test.txt" is less than "data/test.txt" )

here links of those files:
"data/test.txt" : https://github.com/kyzhouhzau/BERT-NER/blob/master/data/test.txt

"output/result_dir/label_test.txt" : https://github.com/kyzhouhzau/BERT-NER/blob/master/output/result_dir/label_test.txt

thanks

After wordpieces tokenize, sentence length not always shorter than 128 in the test set, i think

I meet the same problem the output/result_dir/label_test.txt sometimes more than data/test.txt sometimes less than it . so do you solute your problem?

@kyzhouhzau Thanks for the code. I'm also confuse about your "output/result_dir/label_test.txt" & "data/test.txt" - those 2 files are completely different. From my understanding, the "output/result_dir/label_test.txt" is generated only after the model finished training? meaning before I run "bash run_ner.sh", there is no "label_test.txt" in the "output/result_dir" folder? If my understanding is not correct, would very appreciate your clarification.