Pinned Repositories
Yi
A series of large language models trained from scratch by developers @01-ai
LooGLE
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
WeightWatcher
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
test-text-cnn
testing for text-cnn
benchbench
A package dedicated for running benchmark agreement testing
LVEval
Repository of LV-Eval Benchmark
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
T-Eval
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
GAOKAO-Bench
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
cobraheleah's Repositories
cobraheleah/test-text-cnn
testing for text-cnn