symflower/eval-dev-quality
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
GoMIT
Issues
- 1
Collect Go coverage if tests trigger panic
#175 opened - 2
Deal with dependencies requested by LLMs
#174 opened - 3
LLM result parsing bug
#173 opened - 2
Improve maintainability of assessments
#169 opened - 0
Evaluation task: Code repair
#168 opened - 2
- 0
Support multiple evaluation tasks
#165 opened - 3
- 0
- 0
- 3
- 0
Deal with failing tests
#158 opened - 0
- 1
- 0
- 0
- 0
Repository not reset for multiple tasks
#147 opened - 0
- 0
Java
#143 opened - 0
- 2
Test for pulling Ollama model is flaky
#135 opened - 0
Follow-up: Allow to retry a model when it errors
#131 opened - 2
- 0
- 1
- 1
Give models a retry on error
#123 opened - 0
Multiple runs without interleaving
#119 opened - 0
Fixed Ollama version
#117 opened - 6
- 0
- 4
Generic OpenAI API provider
#111 opened - 0
Multiple Runs
#108 opened - 0
- 0
Measure Model response time
#105 opened - 5
- 0
Follow up: Ollama Support
#100 opened - 0
- 5
- 1
- 3
Integrate Ollama
#91 opened - 0
- 0
- 1
Add linters where each error is a metric
#81 opened - 3
- 7
Roadmap for v0.5.0
#79 opened - 5
Fix svg Y axis ticks
#73 opened - 0
Java language implementation
#61 opened - 0
Don't hardcode test file path
#58 opened - 0
Move more output into the logs
#52 opened - 0