✨ Fine grained evaluations

Question

✨ Fine grained evaluations

asim-shrestha opened this issue 10 months ago · 1 comments

Suppose an eval has an output of 100 lists. Then at test time you get all 100 list items but you also have one extra. In this case the test would fail.

There could also be a case where we get 99/100 elements. This would still be a singular failure

We should have more fine grained failures in this case to display to the user that it retrieved 100% of what it was supposed to or that it was just missing a single element. We should then calculate a score based on this

Answer 1 · 2023-11-28T04:29:29.000Z

Fixed in #15