wwngh1233's Stars
THUDM/ChatGLM2-6B
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
openlm-research/open_llama
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
baichuan-inc/Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
lyogavin/airllm
AirLLM 70B inference with single 4GB GPU
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
jeinlee1991/chinese-llm-benchmark
中文大模型能力评测榜单:目前已囊括115个大模型,覆盖chatgpt、gpt4o、百度文心一言、阿里通义千问、讯飞星火、商汤senseChat、minimax等商用模型, 以及百川、qwen2、glm4、yi、书生internLM2、llama3等开源大模型,多维度能力评测。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Xwin-LM/Xwin-LM
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
GAIR-NLP/factool
FacTool: Factuality Detection in Generative AI
haonan-li/CMMLU
CMMLU: Measuring massive multitask language understanding in Chinese
epfLLM/Megatron-LLM
distributed trainer for LLMs
OpenLMLab/GAOKAO-Bench
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
the-crypt-keeper/can-ai-code
Self-evaluating interview for AI coders
GAIR-NLP/abel
SOTA Math Opensource LLM
tianyi-lab/Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
FlagOpen/FlagEval
FlagEval is an evaluation toolkit for AI large foundation models.
kwai/KwaiYii
sufengniu/RefGPT
getcursor/eval
OpenMOSS/HalluQA
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
360CVGroup/SEEChat
Multimodal chatbot with computer vision capabilities integrated
MikeGu721/XiezhiBenchmark
Felixgithub2017/MMCU
MEASURING MASSIVE MULTITASK CHINESE UNDERSTANDING
my-other-github-account/llm-humaneval-benchmarks
qcri/LLMeBench
Benchmarking Large Language Models
HillZhang1999/NaSGEC
Code & Data for our Paper "NaSGEC: Multi-Domain Chinese Grammatical Error Correction for Native Speaker Texts" (ACL 2023 Findings)
hongwang600/FLAIR
h2oai/h2o-LLM-eval
Large-language Model Evaluation framework with Elo Leaderboard and A-B testing