scicode-bench/SciCode

A benchmark that challenges language models to code solutions for scientific problems

PythonApache-2.0

Issues

Add gpt-4.5 to the leaderboard
#37 opened 8 months ago by LuigiPagani
0
Benchmark o1
#35 opened 8 months ago by LuigiPagani
0
Benchmark Claude Sonnet 3.7
#36 opened 8 months ago by LuigiPagani
0
Improve discoverability of your work on Hugging Face
#5 opened 9 months ago by NielsRogge
3
add new claude-3-5-sonnet-20241022
#18 opened 9 months ago by Kreijstal
0
Add o1 17/12 To the Leaderboard
#21 opened 9 months ago by LuigiPagani
2
One click run
#6 opened 9 months ago by ofirpress
2
Deepseek R1 Evaluation Results?
#25 opened 9 months ago by jasonzliang
2
Ground Truth Code for problems_all.jsonl
#20 opened 9 months ago by jasonzliang
2
o1 models results
#17 opened a year ago by stalkermustang
1
Solution generation skips certain steps?
#12 opened a year ago by idavidrein
2
Request to evaluate the new O1 models by OpenAI (O1-preview and O1-mini)
#14 opened a year ago by Belzedar94
2
Potentially Too Strict Judgement on Calculated Result
#8 opened a year ago by XuGW-Kevin
2
Inquiry on Skipping Specific Problems in SciCode Benchmark
#1 opened a year ago by HariSeldon0
1
Dependency versions for evaluation
#2 opened a year ago by zrait
0