Color scheme in tables doesn't respect metrics ranges.
Opened this issue · 0 comments
g8a9 commented
When we visualize faithfulness and plausibility metrics we use a color scheme where, generally, the darker the cell, the better.
However, color ranges are unclear and seem to be different for faithfulness and plausibility metrics.
E.g., see below
Here aopc_suff
has the worst value with 1 and is correctly displayed in white, but plausibility metrics based on F1 scores (e.g., token_f1_plau
) are not shown in white when they are 0 (worst value).
I think we should match the color and metric ranges, and also add an extensive user guide in the doc to clarify the matter.