wikilinks/neleval

Consider adding LEA to coref evaluation metrics?

Opened this issue · 1 comments

LEA appears to define its P (R) as macro-averaged P (R) over pairs, weighted by entity size (asymmetrically, such that the recall is weighted by entity prevalence in the gold standard), with the exception that singleton clusters are treated as a single pair. (Is that correct, @ns-moosavi?)

I'm not sure if LEA is used in practice, yet. In particular I have my doubts about how principled the handling of singletons is. More consistent would be to use link(n) = n^2 / 2 instead of n(n-1)/2 so that every mention gets granted its singleton link. But this would be identical to B-cubed if I'm not mistaken.

(1) Yes, that is correct.
Self-links, i.e. singleton links, are not exactly the same as single pair links though.
Self-links are only considered for mentions which do not have any links to other mentions.
If a singleton is connected to another mention in the system output, the resolution score is zero.

(2) The issue with your second point is that there are a lot of disagreements with regard to handling singletons in coreference resolution.
There are different annotation schemes for various coreference corpora, e.g. MUC, ACE, CoNLL, SEMEVAl-2010, in which singletons are handled differently.
The existence of various distinct annotation schemes for coreference resolution
is an indicator that there is a disagreement in defining coreference relations and the role of singletons in coreference resolution.

Our point of view in assigning self-links to singletons is that detecting all mentions that refer to an entity, i.e. either singletons or coreferent mentions, is much easier than recognizing coreferent mentions.
For instance, all singletons in the SEMEVAl-2010 corpus are annotated automatically and by heuristics.

We design LEA in a way that it does not reward the recognition of referring expressions regardless of their coreference decisions.
The recognition of a singleton is rewarded only if it is also recognized as a singleton in the system output, not when it is assigned to some random output entity.
Similarly, the recognition of a coreferent mention is rewarded only if it is linked to at least one of its correct coreferring mentions, not when it is somewhere in the system output either as a singleton or a coreferent mention.
If you equally consider self-links for all entities, the existence of any referring expression, singleton or coreferent, in the system output will be rewarded no matter how it is classified.

However, one may disagree with our point of view and wants to treat all entities, singletons or coreferent entities, equally.
This way the resolution score would be resolution_score = (n(n-1)/2) +1, and there would be no special self-links for singletons.
As long as one uses links instead of mentions, LEA is different from B3.

(3) LEA is designed for intra-document coreference evaluation.
For each entity, it considers the "importance" of the entity in the document and the "resolution score" of the entity within the document, i.e. intra_resolution_score.
I believe it makes sense to also incorporate an inter_resolution_score, i.e. whether the entity is resolved correctly to a knowledge base ID, If you want to adapt LEA for coreference resolution in the context of entity linking.
There could be various ways of combining inter_resolution_score and intra_resolution_score.
For instance
LEA = \sum_{entities} (importance * intra_resolution_score * (W_1 + inter_resolution_score)) / \sum_{entities} ((1+W1) * importance)

or

LEA = \sum_{entities} importance * (W_2 * inter_resolution_score + W_3 * intra_resolution_score) / \sum_{entities} importance
where W_2+W_3 = 1