glample/tagger

Inconsistent conversion for IOBES to IOB

sbmaruf opened this issue · 1 comments

Example:
From eng.testb,

CRICKET NNP I-NP O
- : O O
LEICESTERSHIRE NNP I-NP I-ORG
TAKE NNP I-NP O
OVER IN I-PP O
AT NNP I-NP O
TOP NNP I-NP O
AFTER NNP I-NP O
INNINGS NNP I-NP O
VICTORY NN I-NP O
. . O O

the code update the tag scheme by update_tag_scheme() function to convert iob to iobes.
now while evaluate, it convert back to iobes to iob here.
The output in the files are like following,

CRICKET NNP I-NP O O
- : O O O
LEICESTERSHIRE NNP I-NP B-ORG O
TAKE NNP I-NP O O
OVER IN I-PP O O
AT NNP I-NP O O
TOP NNP I-NP O O
AFTER NNP I-NP O O
INNINGS NNP I-NP O O
VICTORY NN I-NP O O
. . O O O

where the last column is the predicted output and it's previous column is the TRUE tag.
now LEICESTERSHIRE in given dataset is I-ORG but when we write the output to the file we write B-ORG. Isn't it a wrong conversion? And the result may vary for this.
@glample

Ok. I got it. It converts IOBES to IOB2 where the main dataset has the tag in IOB1 format.