qcwthu's Stars
microsoft/autogen
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
e2b-dev/awesome-ai-agents
A list of AI autonomous agents
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
huggingface/trl
Train transformer language models with reinforcement learning.
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
OpenBMB/ToolBench
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
OpenNMT/CTranslate2
Fast inference engine for Transformer models
google/BIG-bench
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
allenai/open-instruct
embeddings-benchmark/mteb
MTEB: Massive Text Embedding Benchmark
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
MineDojo/MineDojo
Building Open-Ended Embodied Agents with Internet-Scale Knowledge
WisdomShell/codeshell
A series of code large language models developed by PKU-KCL
THUDM/AgentTuning
AgentTuning: Enabling Generalized Agent Abilities for LLMs
princeton-nlp/MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
hendrycks/math
The MATH Dataset (NeurIPS 2021)
GanjinZero/RRHF
[NIPS2023] RRHF & Wombat
ruixiangcui/AGIEval
haonan-li/CMMLU
CMMLU: Measuring massive multitask language understanding in Chinese
sail-sg/lorahub
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
OpenLemur/Lemur
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
declare-lab/instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
onejune2018/Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
suzgunmirac/BIG-Bench-Hard
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
FlagAI-Open/Aquila2
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
GAIR-NLP/abel
SOTA Math Opensource LLM
microsoft/SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. SmartPlay is designed to be easy to use, and to support future development of LLMs.
abhishekpanigrahi1996/Skill-Localization-by-grafting
srhthu/LM-CompEval-Legal
Code for the paper "A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction"