banksy23/Code-Eval

Evaluate the code generation ability of Code LLMs. Supports mainstream benchmarks such as HumanEval and MBPP. Define chat templates and pre/post-processing code in interface form to facilitate customizing the logic for extracting valid code snippets for testing for one's own model.

This repository is not active