2timesjay/human-eval
Harness for experiments on human-eval dataset. Based On Code for the paper "Evaluating Large Language Models Trained on Code"
PythonMIT
Watchers
No one’s watching this repository yet.
Harness for experiments on human-eval dataset. Based On Code for the paper "Evaluating Large Language Models Trained on Code"
PythonMIT
No one’s watching this repository yet.