Awesome-LLM-Agent

Welcome to our comprehensive collection on LLM-based agents, with an emphasis on reasoning, memory, action, and related applications. Dive into a diverse array of academic papers, benchmarks, and open-source projects that explore the depths of LLM capabilities. This repo is actively maintained and frequently updated 🧑‍💻. Stay tuned for the latest advancements in the field 🚀!

Papers
Open-Source Projects

Papers

🔥 for papers with >100 citations or repositories with >500 stars.

🚀 for papers with >300 citations or repositories with >1500 stars.

Survey 🔍

🔥 (arXiv 2023.08) A Survey on Large Language Model based Autonomous Agents [Paper] [GitHub]
🔥 (arXiv 2023.09) The Rise and Potential of Large Language Model Based Agents: A Survey [Paper] [GitHub]
(arXiv 2023.10) AI Alignment: A Comprehensive Survey [Paper]
🔥 (arXiv 2023.12) Retrieval-Augmented Generation for Large Language Models: A Survey [Paper] [GitHub]
(arXiv 2024.01) Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security [Paper] [GitHub]
(arXiv 2024.01) Large Language Model based Multi-Agents: A Survey of Progress and Challenges [Paper] [GitHub]
🔥 (TMLR'2024) Cognitive Architectures for Language Agents [Paper] [GitHub]
(arXiv 2024.01) Agent AI: Surveying the Horizons of Multimodal Interaction [Paper]

Benchmark 📈

🔥 (NeurIPS'2022) WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents [Paper] [GitHub] [Website]
(EACL'2023) MTEB: Massive Text Embedding Benchmark [Paper] [GitHub] [Leaderboard]
(EMNLP'2023) API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [Paper] [GitHub]
🔥 (NeurIPS'2023) PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change [Paper] [GitHub]
(NeurIPS'2023) ToolQA: A Dataset for LLM Question Answering with External Tools [Paper] [GitHub]
(arXiv 2023.09) Benchmarking Large Language Models in Retrieval-Augmented Generation [Paper] [GitHub]
🔥 (ICLR'2024) WebArena: A Realistic Web Environment for Building Autonomous Agents [Paper] [GitHub] [Website]
🚀 (ICLR'2024) AgentBench: Evaluating LLMs as Agents [Paper] [Github] [Website]
(arXiv 2023.10) Benchmarking Large Language Models As AI Research Agents [Paper] [Github]
(arXiv 2023.12) T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step [Paper] [GitHub] [Website]
(arXiv 2024.01) VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks [Paper] [GitHub] [Website]
(arXiv 2024.03) DevBench: A Comprehensive Benchmark for Software Development [Paper] [GitHub]
(arXiv 2024.04) AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent [Paper] [GitHub]
(arXiv 2024.04) STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [Paper] [GitHub]

Reasoning and Prompt Engineering 💡

🚀 (NeurIPS'2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [Paper]
🚀 (ICLR'2023) ReAct: Synergizing Reasoning and Acting in Language Models [Paper] [GitHub] [Website]
🔥 (arXiv 2023.05) ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models [Paper] [GitHub]
🔥 (EMNLP'2023) Reasoning with Language Model is Planning with World Model [Paper] [GitHub]
🚀 (NeurIPS'2023) Tree of Thoughts: Deliberate Problem Solving with Large Language Models [Paper] [GitHub]
🚀 (NeurIPS'2023) Reflexion: Language Agents with Verbal Reinforcement Learning [Paper] [GitHub]
🚀 (NeurIPS'2023) Self-Refine: Iterative Refinement with Self-Feedback [Paper] [GitHub]
(NeurIPS'2023) Self-Evaluation Guided Beam Search for Reasoning [Paper] [GitHub] [Website]
🚀 (arXiv 2023.08) Graph of Thoughts: Solving Elaborate Problems with Large Language Models [Paper] [GitHub]
(ICLR'2024) Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph [Paper] [GitHub]
(ICLR'2024) Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [Paper] [GitHub]
(arXiv 2024.01) Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts [Paper]
(arXiv 2024.01) Self-Rewarding Language Models [Paper]

Memory and Retrieval Augmented Generation ⚙️

🚀 (PMLR'2022) Improving language models by retrieving from trillions of tokens [Paper] [GitHub]
(arXiv 2023.01) REPLUG: Retrieval-Augmented Black-Box Language Models [Paper]
🔥 (EMNLP'2023) Active Retrieval Augmented Generation [Paper] [GitHub]
(EMNLP'2023 findings) Self-Knowledge Guided Retrieval Augmentation for Large Language Models [Paper]
🚀 (ICLR'2024) DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines [Paper] [GitHub]
(ICLR'2024) Retrieval meets Long Context Large Language Models [Paper]
🔥 (ICLR'2024) Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [Paper] [GitHub] [Website]
(NAACL'2024) REST: Retrieval-Based Speculative Decoding [Paper] [GitHub]
(arXiv 2023.11) Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models [Paper]
(arXiv 2024.02) G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [Paper] [GitHub]
(arXiv 2024.03) RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation [Paper] [GitHub] [Website] [Demo]
(arXiv 2024.03) RAFT: Adapting Language Model to Domain Specific RAG [Paper] [GitHub] [Website]
(arXiv 2024.04) Introducing Super RAGs in Mistral 8x7B-v1 [Paper]

Action and Tool Using 🛠️

🔥 (CVPR'2023) Visual Programming: Compositional visual reasoning without training [Paper] [GitHub]
🚀 (NeurIPS'2023) Toolformer: Language Models Can Teach Themselves to Use Tools [Paper] [GitHub]
🚀 (arXiv 2023.05) Gorilla: Large Language Model Connected with Massive APIs [Paper] [GitHub] [Website]
(arXiv 2023.05) ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [Paper] [GitHub]
(arXiv 2023.06) ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases [Paper] [GitHub]
🚀 (ICLR'2024) ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [Paper] [GitHub]
🚀 (TMLR'2024) Voyager: An Open-Ended Embodied Agent with Large Language Models [Paper] [GitHub]

Agent Fine-Tuning 🤖

🔥 (arXiv 2023.10) AgentTuning: Enabling Generalized Agent Abilities for LLMs [Paper] [GitHub] [Website]
(arXiv 2023.10) FireAct: Toward Language Agent Fine-tuning [Paper] [GitHub] [Website]
(arXiv 2024.02) AUTOACT: Automatic Agent Learning from Scratch via Self-Planning [Paper] [GitHub] [Website]
(arXiv 2024.03) Agent Lumos: Unified and Modular Training for Open-Source Language Agents [Paper] [GitHub] [Website]
(arXiv 2024.03) Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [Paper] [GitHub] [Website]

LLM Fine-Tuning 🧠

🚀 (NeurIPS'2022) Training language models to follow instructions with human feedback [Paper] [GitHub]
🚀 (NeurIPS'2023) Direct Preference Optimization: Your Language Model is Secretly a Reward Model [Paper] [GitHub]
(arXiv 2024.01) Self-Rewarding Language Models [Paper] [GitHub]
(arXiv 2024.02) Noise Contrastive Alignment of Language Models with Explicit Rewards [Paper] [GitHub]

Applications 💻

Web Agents

🔥 (NeurIPS'2023) Mind2Web: Towards a Generalist Agent for the Web [Paper] [GitHub]
(NeurIPS'2023 workshops) LASER: LLM Agent with State-Space Exploration for Web Navigation [Paper]
(ICLR'2024) A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis [Paper] [GitHub]
🔥 (arXiv 2024.01) GPT-4V(ision) is a Generalist Web Agent, if Grounded [Paper] [Github] [Website]

Recommender Agents

(arXiv 2023.08) RecMind: Large Language Model Powered Agent For Recommendation [Paper]
(arXiv 2023.10) On Generative Agents in Recommendation [paper] [GitHub]
(arXiv 2023.10) AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems [Paper]

Code Agents

🔥 (ICLR'2024) SWE-bench: Can Language Models Resolve Real-World Github Issues? [Paper] [GitHub] [Website]
🚀 (arXiv 2024.04) AutoCodeRover: Autonomous Program Improvement [Paper] [GitHub]
(arXiv 2024.04) Can Language Models Solve Olympiad Programming? [Paper] [GitHub]

Paper Review Agents

(arXiv 2023.10) Can large language models provide useful feedback on research papers? A large-scale empirical analysis [Paper] [GitHub]
(arXiv 2024.01) MARG: Multi-Agent Review Generation for Scientific Papers [Paper] [GitHub]
(arXiv 2024.02) Reviewer2: Optimizing Review Generation Through Prompt Generation [Paper] [GitHub]
(CHI'2024) A Design Space for Intelligent and Interactive Writing Assistants [Paper] [GitHub] [Website]

Trading Agents

(ICLR'2024) SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series [Paper] [GitHub]
(ICLR'2024 workshops) FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design [Paper]

Others

🚀 (UIST'2023) Generative Agents: Interactive Simulacra of Human Behavior [Paper] [GitHub]
🚀 (NeurIPS'2023) HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. [Paper] [GitHub]
🔥 (ICLR'2024) ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [Paper] [GitHub]
(arXiv 2023.04) Octopus v2: On-device language model for super agent [Paper]
(arXiv 2024.04) Empowering Biomedical Discovery with AI Agents [Paper]

Open-Source Projects

LLM Platform

Title	Link	Description
FastChat	lm-sys/FastChat	An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
🦜️🔗 LangChain	langchain-ai/langchain	🦜🔗 Build context-aware reasoning applications
🗂️ LlamaIndex 🦙	run-llama/llama_index	LlamaIndex is a data framework for your LLM applications
LLaMA-Factory	hiyouga/Llama-Factory	Unify Efficient Fine-Tuning of 100+ LLMs
Petals🌸	bigscience-workshop/petals	🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Open-Assistant	LAION-AI/Open-Assistant	OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Multi-Agent Framework

Title	Link	Description
CAMEL🐫	camel-ai/camel	🐫 CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society
AutoGen	microsoft/autogen	A programming framework for agentic AI.
🤖 AgentVerse🪐	OpenBMB/AgentVerse	🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

Vector Database

Title	Link	Description
Chroma	chroma-core/chroma	the AI-native open-source embedding database
Faiss	facebookresearch/faiss	A library for efficient similarity search and clustering of dense vectors.

zjwu0522/Awesome-LLM-Agent