nuprl/MultiPL-E

A multi-programming language benchmark for LLMs

PythonNOASSERTION

Issues

Task HumanEval/092 has contradictory tests in Rust
#142 opened a month ago by geajack
5
leetcode dataset not found
#134 opened 2 months ago by way2swaggy
1
Citation for the LeetCode Dataset
#140 opened 2 months ago by JJGO
1
How to run a multi-GPU model for inference testing?
#141 opened 2 months ago by huangmenglong
2
Quantized model is not supported - Calling cuda() is not supported for 4-bit or 8-bit quantized models
#137 opened 3 months ago by Santhoshkumar-p
2
The "automodel" file does not have this parameter.
#131 opened 5 months ago by shuaiwang2022
0
code generated with wrong end of string place
#128 opened 5 months ago by tedvuminhhuy
2
padding left some token causing compile error
#127 opened 5 months ago by tedvuminhhuy
2
Unable to load weights from pytorch checkpoint file
#124 opened 6 months ago by LeVuMinhHuyWindows
1
Evaluation with a container stops halfway without error message
#120 opened 7 months ago by Chen-Hailin
4
All non-multiline commented prompts currently broken
#114 opened 8 months ago by cassanof
0
R prompts are currently broken
#111 opened 8 months ago by arjunguha
0
Turn translator into a library
#82 opened 10 months ago by arjunguha
1
Could I get all statistics?
#91 opened a year ago by sh0416
2
load_dataset() doesn't work without specifying dataset_revision?
#90 opened a year ago by ShushanArakelyan
1
Error evaluating TS/Java
#89 opened a year ago by memray
3
Strange Scala Unit test Translation for output Tuple Type w/ extra parenthesis
#65 opened a year ago by PootieT
4
Add HumanEval+ tests
#62 opened a year ago by Randl
15
Scala tests comparing optional value
#64 opened a year ago by PootieT
3
Small issues with Swift prompt signatures
#63 opened a year ago by PootieT
3
C# test sequence equality
#71 opened a year ago by PootieT
1
PHP test indexed array comparison
#72 opened a year ago by PootieT
0
Perl Unit test comparing float values
#67 opened a year ago by PootieT
4
Perl Unit test when expecting "False/0" output
#66 opened a year ago by PootieT
5
Environment for evaluating C#
#34 opened a year ago by memray
13
Racket unit test numerical equivalence
#60 opened a year ago by PootieT
4
R unit test comparison between integer and double
#55 opened a year ago by PootieT
8
R unit tests atomic vector comparison
#50 opened a year ago by PootieT
7
C++ test float comparison
#51 opened a year ago by PootieT
3
Java transpiled test failing with optional output
#47 opened a year ago by PootieT
2
Stop tokens for Java do not allow completions that produce several top-level methods.
#59 opened a year ago by PootieT
1
Reported pass@k silently wrong for n<k
#40 opened a year ago by daniel-vainsencher
1
Warning: Bash performance results artificially low
#56 opened a year ago by PootieT
5
Support execution for the FIM benchmarks
#29 opened a year ago by arjunguha
1
FYI, HumanEval 95 check dict case canonical solution is wrong
#52 opened a year ago by PootieT
1
Java program evaluation error with javatuples.Pair class
#45 opened a year ago by PootieT
2
Fix some easy paths to wrong evaluation results
#31 opened a year ago by daniel-vainsencher
5
Doesn't check for cuda support before attempting to execute on GPU.
#30 opened a year ago by esslushy
1
Better isolation for evaluation
#19 opened 2 years ago by arjunguha
1
merge code from cd58e0314c9fefe8095149f07f9004248d1dbf94
#14 opened 2 years ago by arjunguha
0
Bad filenames when using gz
#16 opened 2 years ago by arjunguha
0