Pinned issues
Issues
- 3
Is there something wrong with 'google/gemma-1.1-2b-it' ?
#1854 opened by rangehow - 2
IndexError: list index out of range when running benchmark on gguf model
#1768 opened by fherrmannsdoerfer - 1
Task description newline characters removed by Jinja templating, affecting generated requests and performance
#1817 opened by ma0li - 5
llama3 baseline reproduction problem
#1799 opened by fmm170 - 8
I get this error whenever I try to run an eval: ImportError: cannot import name 'HfApi' from 'huggingface_hub'
#1826 opened by menhguin - 2
- 2
Error when evaluating math.
#1819 opened by SefaZeng - 1
Evaluate encoder-decoder-models
#1840 opened by Bachstelze - 1
How to evaluate a large model like llama-65B?
#1804 opened by fayuge - 4
`--tasks list` does not work
#1850 opened by Sunt-ing - 3
SyntaxError when import lm_eval
#1820 opened by mxjmtxrm - 0
How to use Zeno
#1842 opened by DavidAdamczyk - 3
- 2
Errors when loading exact_match.py
#1830 opened by twxin - 1
--device cuda:3 not honored when using --model vllm
#1846 opened by LGLG42 - 0
Multi Label Classification
#1814 opened by IsraelAbebe - 2
MPS backend out of memory evaluating fine-tuned Mixtral-8x7B-Instruct-v0.1 on a machine with 100+ GB
#1835 opened by chimezie - 2
Bug: wrong `until` default value for chat based model
#1837 opened by YilunZhou - 0
Inconsistent evaluation results with Chat Template
#1841 opened by shiweijiezero - 1
- 0
AssertionError: aggregation named 'mean' conflicts with existing registered aggregation!
#1839 opened by hunter2009pf - 3
Using Language Models as Evaluators
#1831 opened by lintangsutawika - 0
how to run all the bigbench tasks at once?
#1809 opened by kbmlcoding - 0
sha256 for datasets or samples
#1836 opened by artemorloff - 1
Multi-round evaluation for chat models
#1816 opened by YilunZhou - 2
The input format for XNLI seems wired?
#1822 opened by SefaZeng - 2
Exclude all current tasks
#1801 opened by YilunZhou - 2
Avoid slow testing due to network issues.
#1824 opened by pixeli99 - 0
eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found."
#1829 opened by Jp-17 - 0
Add More Tests
#1827 opened by haileyschoelkopf - 1
Support Mamba based models for evaluation tasks
#1812 opened by NamburiSrinath - 1
when MMLU eval, num_few_shot=5, more GPU overhead
#1818 opened by chunniunai220ml - 0
TypeError: 'NoneType' object is not iterable when using cache and loglikelihood_rolling
#1821 opened by mdocekal - 4
how to evaluate on boolq? incorrect results
#1813 opened by sidhantls - 7
- 1
Hugging Face: Open LLM Leaderboard: how do I reproduce results for details_gpt2 repository
#1802 opened by CoconutJJ - 0
Gemini 1.5/Ultra support
#1808 opened by notrichardren - 1
- 2
Add NPU support for huggingface.py
#1797 opened by jiaqiw09 - 1
- 4
Same results - different models
#1771 opened by aleksoren - 2
Sorting task output alphabetically
#1774 opened by ad8e - 1
error in eval-tracker : 'Namespace' object has no attribute 'push_results_to_hub'
#1778 opened by abgoswam - 0
Support loading slices of a split from a dataset
#1788 opened by alexrs - 1
Data preprocess is slow for mmlu
#1781 opened by ThisisBillhe - 2
- 0
- 1
Support OpenAI's Batch API
#1770 opened by djstrong - 0
- 3
Cannot have both a group list and task list
#1767 opened by steven-basart