Pinned Repositories
lm-evaluation-harness
A framework for few-shot evaluation of language models.
EQ-Bench
A benchmark for emotional intelligence in large language models
antislop-sampler
entropix-gsm8k-eval
FastEval
Fast & more realistic evaluation of chat language models. Includes leaderboard.
gutenberg-dataset-scripts
lm-evaluation-harness
A framework for few-shot evaluation of language models.
MMLU-Pro-IRT
The scripts for MMLU-Pro, using a smaller IRT-tuned dataset
Ollama-MMLU-Pro-IRT
Ollama-MMLU-Pro fork, using a smaller IRT-tuned subset of MMLU-Pro
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
sam-paech's Repositories
sam-paech/antislop-sampler
sam-paech/lm-evaluation-harness
A framework for few-shot evaluation of language models.
sam-paech/Ollama-MMLU-Pro-IRT
Ollama-MMLU-Pro fork, using a smaller IRT-tuned subset of MMLU-Pro
sam-paech/entropix-gsm8k-eval
sam-paech/FastEval
Fast & more realistic evaluation of chat language models. Includes leaderboard.
sam-paech/gutenberg-dataset-scripts
sam-paech/MMLU-Pro-IRT
The scripts for MMLU-Pro, using a smaller IRT-tuned dataset