chakki-works/seqeval

Is there a way to print the confusion matrix

rsuwaileh opened this issue · 5 comments

Hey,

I want to print the FP and FN for my system. I checked the code and it seems you don't use them in the calculation and just use pred_sum and true_sum. Is there an easy way to get these numbers?

Thanks!

I just found this answer. However, this seems to be computed on the token level. Is there a way to get the confusion matrix on the entity level?

In the example in code you show these numbers:

    Example:
        >>> from seqeval.metrics import performance_measure
        >>> y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'O', 'B-ORG'], ['B-PER', 'I-PER', 'O']]
        >>> y_pred = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'O'], ['B-PER', 'I-PER', 'O']]
        >>> performance_measure(y_true, y_pred)
        (3, 3, 1, 4)

But when I run it, I get the following numbers:

from seqeval.metrics import performance_measure
y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'O', 'B-ORG'], ['B-PER', 'I-PER', 'O']]
y_pred = [['O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'O'], ['B-PER', 'I-PER', 'O']]
performance_measure(y_true, y_pred)
{'TP': 3, 'FP': 2, 'FN': 1, 'TN': 4}

If it's token level, then it should be:
{'TP': 4, 'FP': 1, 'FN': 1, 'TN': 4}
If it's entity level, then it should be:
{'TP': 1, 'FP': ??, 'FN': 1, 'TN': 4}

Can you explain these numbers?
How the partial match is handled?

I have same question how we can calculate confussion matrix using seqeval library

I have the same question. I am working on token classification and results are confusing

{'eval_loss': 1.503118872642517, 'eval_precision': 0.2734958710184821, 'eval_recall': 0.16045680009228286, 'eval_f1': 0.20225372591784804, 'eval_accuracy': 0.8713822804442352, 'eval_runtime': 73.1268, 'eval_samples_per_second': 59.937, 'epoch': 17.0}

Eval accuracy is high and precision, recall, and f1 scores are very low. It seems there might be a bug related to computing the score at the entity level.

@mirfan899 it `s just normal, because for token classification, the number of O label much higher than B label.

To complement what @zingxy said, accuracy is just "of all tokens, how many did I guess right?", with class O included. This makes it easy to have high/very high accuracies since most of them will usually be O.

On the other hand, the F1 score reported here is the micro average of the classes, without taking into account the O class. Check the numbers in the classification report.