Tool/command to combine multiple evaluations into one
bauersimon opened this issue · 2 comments
bauersimon commented
If we run them on multiple machines, we want to easily combine them.
TODO
1st iteration
- Remove all CSVs but
evaluation.csv
- Remove the
models-summed.csv
and<language>-summed.csv
files - Remove all the evaluation CSV parsing logic
- Remove all occurrences of model's cost and human-readable name
- #256
- Remove the
- Let the "report" command also generate the Markdown report
- Reason: we want to move to a more flexible way of handling the evaluation data -> SQL
2nd iteration
- Create a new command
eval-dev-quality report
type Report struct {
EvaluationPaths []string // holds "evaluation.csv" file paths
ResultPath string
}
- Create an
evaluation.csv
file inResultPath
that will hold the combined evaluation records - Loop through the
EvaluationPaths
and find allevaluation.csv
- Append the CSV records to the overall
evaluation.csv
- Append the CSV records to the overall
bauersimon commented
Current workarounds:
cat docs/reports/<version>/*/models-summed.csv | sort | uniq
for overall scorescat docs/reports/<version>/*/evaluation.csv | sort | uniq
for repository-specific scores
bauersimon commented
- I think
eval-dev-quality report
is great - if we want to obtain i.e. the real names and costs of the models in the reporting tool we might need to share some parts of the command arguments (i.e. the provider tokens)... though I think openrouter allows to query information without a token so we should be good for now
- I think @zimmski mentioned he wanted to use
postgres
now ... not sure how that would work though cause it seems you would need a running DB for that in the background?