hipe-eval/HIPE-scorer

Evaluation Measures: Understanding of macro average

Closed this issue · 0 comments

Micro P, R, F1:

  • P, R, F1 on entity level (not on token level): micro average (= over all documents)
    • strict and fuzzy (= at least 1 token overlap)
    • separately per type and cumulative for all types

Macro as document-level average of micro P, R, F1

  • P, R, F1 on entity level (not on token level): doc-level macro average (= average of separate micro evaluation on each document)
    • strict and fuzzy (= at least 1 token overlap)
    • separately per type and cumulative for all types

@e-maud @mromanello: The following type-oriented macro average can be computed from the output of Micro P, R, F1 (spreadsheet style). Therefore the scorer should not directly compute it (for now, at least).

Macro as average over type-specific P, R, F1 measures

  • P, R, F1 on entity type: doc-level macro average (= average of separate micro evaluation on each document)
    • strict and fuzzy (= at least 1 token overlap)