huggingface/lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
PythonMIT
Issues
- 4
How to run 30b plus model with lighteval when accelerate launch failed? OOM
#155 opened by xiechengmude - 6
Version of a task should be configurable.
#172 opened by PhilipMay - 1
Add Sympy equivalence for MATH / GSM8K?
#170 opened by lewtun - 0
Evaluate EncoderDecoderModels
#183 opened by Bachstelze - 1
- 1
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!
#176 opened by gody7334 - 6
Performance compared to lm-evaluation-harness
#179 opened by geoalgo - 4
- 8
[New Task] Add AlpacaEval LC
#139 opened by YannDubs - 2
add transformers model to be used as judge
#153 opened by NathanHB - 2
Do an intro notebook on how to use `lighteval`
#143 opened by clefourrier - 1
- 0
- 3
[MATH] Too many values to unpack (expected 2)
#140 opened by rkinas - 2
Feature: Checkpointing on task level.
#161 opened by PhilipMay - 6
MMLU evaluation fails with Mistral
#159 opened by sanchit-gandhi - 4
- 1
Add LLM as a judge as a metric
#141 opened by clefourrier - 1
`LatexTableWriter` created but never used.
#151 opened by PhilipMay - 0
Homogeneize logging system
#118 opened by clefourrier - 1
Add dtype management in inference endpoints
#117 opened by clefourrier - 5
Winogrande degraded results
#132 opened by opherlieber - 0
Add a nanotron model to the test suite
#144 opened by clefourrier - 0
Use config files for the model parameters
#128 opened by clefourrier - 0
Add a logger in the metric functions
#135 opened by NathanHB - 1
Add HumanEval and HumanEval+
#63 opened by lewtun - 0
Add MT-Bench
#88 opened by NathanHB - 5
human eval run
#130 opened by meitalbensinai - 1
- 2
- 1
Add EQ Bench
#114 opened by lewtun - 0
Add BBH subset back!
#125 opened by clefourrier - 3
- 2
Add AGIEval
#79 opened by lewtun - 1
Push details to hub does not work
#99 opened by NathanHB - 1
- 5
- 3
StarCoder2 3B SFT models give CUDA OOM on IFEval
#105 opened by lewtun - 0
Add BBH
#96 opened by clefourrier - 0
- 0
To remember for version upgrades
#97 opened by clefourrier - 4
Anomalously small values `gemma-2b-it` on GMS8k
#82 opened by lewtun - 0
Add the license to all files headers
#87 opened by clefourrier - 3
Large memory usage on MATH
#80 opened by lewtun - 3
- 3
Cannot evaluate models on MATH: `TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'`
#73 opened by lewtun - 1
Need to reupload TruthfulQA
#69 opened by clefourrier - 2
Relax lower bound on `transformers` dependency?
#72 opened by lewtun - 0
Cannot evaluate chat model on TruthfulQA (`TypeError: can only concatenate str (not "list") to str`)
#66 opened by lewtun - 4
Anomalously high scores on GPQA
#68 opened by lewtun