gersteinlab/ML-Bench
The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.09835)
PythonMIT
Issues
- 1
The GPT3.5 result could not be reproduced
#3 opened by iiinsight - 4
How to evaluate local custom models?
#2 opened by VoiceBeer - 0
Any public leaderboards?
#1 opened by zhimin-z