symflower/eval-dev-quality

Log model responses directly to file and reuse them for debugging

bauersimon opened this issue · 1 comments

Goal, be able to use exactly 1:1 responses from a previous run to debug the evaluation logic.

  • log model responses directly to files (either on provider query response level or generate test level)
  • add dummy model that takes these files and responds accordingly (essentially mimicking/replaying the original model responses)

Duplicate of #204. Closing.