chakki-works/seqeval

why classification_report can not count label "ORGANIZATION"

iifiigii opened this issue · 2 comments

the result is
precision recall f1-score support

PRODUCT 0.884 0.840 0.862 1007
LOCATION 0.927 0.760 0.835 50
PERSON 0.000 0.000 0.000 2

micro avg 0.885 0.835 0.859 1059
macro avg 0.884 0.835 0.859 1059

but in the data set,i have:
pred:['ORGANIZATION', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
true:['ORGANIZATION', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

I believe pred sequence should be changed to ['B-ORGANIZATION', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
and similarly true sequence.

You can have a look at the example shown in
https://github.com/chakki-works/seqeval/blob/master/seqeval/metrics/sequence_labeling.py#L288

>>> y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]
>>> y_pred = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER', 'O']]

In terms of the code flow:
classification_report calls get_entities to convert the sequence of tags for the tokens into entities.

In case of suffix=False,
https://github.com/chakki-works/seqeval/blob/master/seqeval/metrics/sequence_labeling.py#L43

            tag = chunk[0]
            type_ = chunk.split('-')[-1]

This gives tag="ORGANIZATION"
whereas start_of_chunk and end_of_chunk expects tag to be one of ['B', 'I', 'O', 'E', 'S']

Hence it fails to extract any entity from the input sequence provided by you.

Thanks for your help,
I adjust a little part of code in "get_entities" you me mentioned.

        tag = chunk[0]
        if(chunk == 'ORGANIZATION'):tag = 'ORG'
        type_ = chunk.split('-')[-1]

and it's work.
Thanks again.