allenai/scibert

scibert throws InvalidTagSequence(tag_sequence) for BIOUL

Opened this issue · 1 comments

I have a used custom dataset where I have encoded entities in BIOUL format. While training on this encoded dataset, I am getting InvalidTagSequence(tag_sequence) error.

Later I changed encoding to IOB1 and the training was smooth, no InvalidTagSequence(tag_sequence) error.

Q1.
While predicting on a new sentence, I get the entities coded in BIOUL format. Is this happening because IOB1 being mapped to BIOUL as mentioned here #50 (comment) ?

Q2.
If it is, since BIOUL is also a valid format. Why scibert throws such error?

This is an AllenNLP issue. Can you share the error stack trace?