cobraheleah

Pinned Repositories

Yi
A series of large language models trained from scratch by developers @01-ai
Language:Jupyter Notebook7.8k 109 292492
LooGLE
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
Language:Python179 3 127
WeightWatcher
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
Language:Python1.6k 33 240130
test-text-cnn
testing for text-cnn
Language:Python10
benchbench
A package dedicated for running benchmark agreement testing
Language:Python16 4 13
LVEval
Repository of LV-Eval Benchmark
Language:Python59 2 68
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Language:Python5k 27 655525
T-Eval
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
Language:Python263 3 5416
GAOKAO-Bench
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
Language:Python617 6 2643
Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Language:Shell16.3k 107 9371.1k