Pinned Repositories
CMMLU
CMMLU: Measuring massive multitask language understanding in Chinese
ceval
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
BCEmbedding
Netease Youdao's open-source embedding and reranker models for RAG products.
MathBench
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
MixEval
The official evaluation suite and dynamic data release for MixEval.
CMMLU
CMMLU: Measuring massive multitask language understanding in Chinese
randomk
test
ZeroEval
A simple unified framework for evaluating LLMs
ZeroEval
A simple unified framework for evaluating LLMs
Waneila's Repositories
Waneila/CMMLU
CMMLU: Measuring massive multitask language understanding in Chinese
Waneila/randomk
Waneila/test
Waneila/ZeroEval
A simple unified framework for evaluating LLMs