why classification_report can not count label "ORGANIZATION"
iifiigii opened this issue · 2 comments
the result is
precision recall f1-score support
PRODUCT 0.884 0.840 0.862 1007
LOCATION 0.927 0.760 0.835 50
PERSON 0.000 0.000 0.000 2
micro avg 0.885 0.835 0.859 1059
macro avg 0.884 0.835 0.859 1059
but in the data set,i have:
pred:['ORGANIZATION', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
true:['ORGANIZATION', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
I believe pred sequence should be changed to ['B-ORGANIZATION', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
and similarly true sequence.
You can have a look at the example shown in
https://github.com/chakki-works/seqeval/blob/master/seqeval/metrics/sequence_labeling.py#L288
>>> y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> y_pred = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
In terms of the code flow:
classification_report
calls get_entities
to convert the sequence of tags for the tokens into entities.
In case of suffix=False,
https://github.com/chakki-works/seqeval/blob/master/seqeval/metrics/sequence_labeling.py#L43
tag = chunk[0]
type_ = chunk.split('-')[-1]
This gives tag="ORGANIZATION"
whereas start_of_chunk
and end_of_chunk
expects tag to be one of ['B', 'I', 'O', 'E', 'S']
Hence it fails to extract any entity from the input sequence provided by you.
Thanks for your help,
I adjust a little part of code in "get_entities" you me mentioned.
tag = chunk[0]
if(chunk == 'ORGANIZATION'):tag = 'ORG'
type_ = chunk.split('-')[-1]
and it's work.
Thanks again.