huggingface/lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

PythonMIT

Issues

How to run 30b plus model with lighteval when accelerate launch failed? OOM
#155 opened a month ago by xiechengmude
4
Version of a task should be configurable.
#172 opened a month ago by PhilipMay
6
Add Sympy equivalence for MATH / GSM8K?
#170 opened 2 months ago by lewtun
1
Evaluate EncoderDecoderModels
#183 opened a month ago by Bachstelze
0
DROP Evaluation with Llama3 (vs. lm-evaluation-harness)
#165 opened 2 months ago by vipulraheja
1
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!
#176 opened a month ago by gody7334
1
Performance compared to lm-evaluation-harness
#179 opened a month ago by geoalgo
6
Error: `ModuleNotFoundError: No module named 'openai'`.
#175 opened a month ago by PhilipMay
4
[New Task] Add AlpacaEval LC
#139 opened 2 months ago by YannDubs
8
add transformers model to be used as judge
#153 opened 2 months ago by NathanHB
2
Do an intro notebook on how to use `lighteval`
#143 opened 2 months ago by clefourrier
2
`Could not initialize the JudgeOpenAI model` and `openi` import error
#166 opened 2 months ago by lewtun
1
Expose a few model predictions / gold answers in the logs
#164 opened 2 months ago by lewtun
0
[MATH] Too many values to unpack (expected 2)
#140 opened 2 months ago by rkinas
3
Feature: Checkpointing on task level.
#161 opened 2 months ago by PhilipMay
2
MMLU evaluation fails with Mistral
#159 opened 2 months ago by sanchit-gandhi
6
Add Code-Centric Interface to LightEval for Enhanced Usability
#148 opened 2 months ago by adithya-s-k
4
Add LLM as a judge as a metric
#141 opened 2 months ago by clefourrier
1
`LatexTableWriter` created but never used.
#151 opened 2 months ago by PhilipMay
1
Homogeneize logging system
#118 opened 3 months ago by clefourrier
0
Add dtype management in inference endpoints
#117 opened 2 months ago by clefourrier
1
Winogrande degraded results
#132 opened 2 months ago by opherlieber
5
Add a nanotron model to the test suite
#144 opened 2 months ago by clefourrier
0
Use config files for the model parameters
#128 opened 2 months ago by clefourrier
0
Add a logger in the metric functions
#135 opened 2 months ago by NathanHB
0
Add HumanEval and HumanEval+
#63 opened 3 months ago by lewtun
1
Add MT-Bench
#88 opened 2 months ago by NathanHB
0
human eval run
#130 opened 2 months ago by meitalbensinai
5
Problem with mutliple tasks from the same dataset
#91 opened 2 months ago by clefourrier
1
Deploying evaluation for finetuned model as AWS SM pipeline step
#107 opened 2 months ago by Avistian
2
Add EQ Bench
#114 opened 3 months ago by lewtun
1
Add BBH subset back!
#125 opened 3 months ago by clefourrier
0
[BUG]: lighteval.utils import is_autogptq_available not working
#113 opened 3 months ago by fanminshi
3
Add AGIEval
#79 opened 3 months ago by lewtun
2
Push details to hub does not work
#99 opened 3 months ago by NathanHB
1
[IFEVAL] Stopping criteria fails for models with ChatML special tokens
#109 opened 3 months ago by lewtun
1
With chat templates, instructions shouldn't be prepended to system prompt
#110 opened 3 months ago by Whadup
5
StarCoder2 3B SFT models give CUDA OOM on IFEval
#105 opened 3 months ago by lewtun
3
Add BBH
#96 opened 3 months ago by clefourrier
0
Collate items in GenerativeTaskDataset by similar EOS token
#100 opened 3 months ago by clefourrier
0
To remember for version upgrades
#97 opened 3 months ago by clefourrier
0
Anomalously small values `gemma-2b-it` on GMS8k
#82 opened 3 months ago by lewtun
4
Add the license to all files headers
#87 opened 3 months ago by clefourrier
0
Large memory usage on MATH
#80 opened 3 months ago by lewtun
3
Align GPQA zero-shot / few-shot prompts with paper?
#70 opened 3 months ago by lewtun
3
Cannot evaluate models on MATH: `TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'`
#73 opened 3 months ago by lewtun
3
Need to reupload TruthfulQA
#69 opened 3 months ago by clefourrier
1
Relax lower bound on `transformers` dependency?
#72 opened 3 months ago by lewtun
2
Cannot evaluate chat model on TruthfulQA (`TypeError: can only concatenate str (not "list") to str`)
#66 opened 3 months ago by lewtun
0
Anomalously high scores on GPQA
#68 opened 3 months ago by lewtun
4