g8a9/ferret

Color scheme in tables doesn't respect metrics ranges.

Opened this issue · 0 comments

g8a9 commented

When we visualize faithfulness and plausibility metrics we use a color scheme where, generally, the darker the cell, the better.
However, color ranges are unclear and seem to be different for faithfulness and plausibility metrics.

E.g., see below

Image

Here aopc_suff has the worst value with 1 and is correctly displayed in white, but plausibility metrics based on F1 scores (e.g., token_f1_plau) are not shown in white when they are 0 (worst value).

I think we should match the color and metric ranges, and also add an extensive user guide in the doc to clarify the matter.