Tool/command to combine multiple evaluations into one

Question

bauersimon opened this issue 3 months ago · 2 comments

If we run them on multiple machines, we want to easily combine them.

1st iteration

Remove all CSVs but evaluation.csv
- Remove the models-summed.csv and <language>-summed.csv files
- Remove all the evaluation CSV parsing logic
- Remove all occurrences of model's cost and human-readable name
- #256
Let the "report" command also generate the Markdown report
- #258
Reason: we want to move to a more flexible way of handling the evaluation data -> SQL

2nd iteration

type Report struct {
     EvaluationPaths []string // holds  "evaluation.csv" file paths
     ResultPath string 
}

Create an evaluation.csv file in ResultPath that will hold the combined evaluation records
Loop through the EvaluationPaths and find all evaluation.csv
- Append the CSV records to the overall evaluation.csv

Answer 1 · 2024-06-20T06:08:55.000Z

Current workarounds:

cat docs/reports/<version>/*/models-summed.csv | sort | uniq for overall scores
cat docs/reports/<version>/*/evaluation.csv | sort | uniq for repository-specific scores

Answer 2 · 2024-07-11T06:29:03.000Z

I think eval-dev-quality report is great
if we want to obtain i.e. the real names and costs of the models in the reporting tool we might need to share some parts of the command arguments (i.e. the provider tokens)... though I think openrouter allows to query information without a token so we should be good for now
I think @zimmski mentioned he wanted to use postgres now ... not sure how that would work though cause it seems you would need a running DB for that in the background?