symflower/eval-dev-quality

Infer if a model produced "too much" code

bauersimon opened this issue · 0 comments

#32 (comment)

We want to check if there is additional code we did not request (i.e. Benchmarks) and incorporate that in the assessment/scoring.