Welcome to our comprehensive collection on LLM-based agents, with an emphasis on reasoning, memory, action, and related applications. Dive into a diverse array of academic papers, benchmarks, and open-source projects that explore the depths of LLM capabilities. This repo is actively maintained and frequently updated π§βπ». Stay tuned for the latest advancements in the field π!
π₯ for papers with >100 citations or repositories with >500 stars.
π for papers with >300 citations or repositories with >1500 stars.
- π₯ (arXiv 2023.08) A Survey on Large Language Model based Autonomous Agents [Paper] [GitHub]
- π₯ (arXiv 2023.09) The Rise and Potential of Large Language Model Based Agents: A Survey [Paper] [GitHub]
- (arXiv 2023.10) AI Alignment: A Comprehensive Survey [Paper]
- π₯ (arXiv 2023.12) Retrieval-Augmented Generation for Large Language Models: A Survey [Paper] [GitHub]
- (arXiv 2024.01) Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security [Paper] [GitHub]
- (arXiv 2024.01) Large Language Model based Multi-Agents: A Survey of Progress and Challenges [Paper] [GitHub]
- π₯ (TMLR'2024) Cognitive Architectures for Language Agents [Paper] [GitHub]
- (arXiv 2024.01) Agent AI: Surveying the Horizons of Multimodal Interaction [Paper]
- π₯ (NeurIPS'2022) WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents [Paper] [GitHub] [Website]
- (EACL'2023) MTEB: Massive Text Embedding Benchmark [Paper] [GitHub] [Leaderboard]
- (EMNLP'2023) API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [Paper] [GitHub]
- π₯ (NeurIPS'2023) PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change [Paper] [GitHub]
- (NeurIPS'2023) ToolQA: A Dataset for LLM Question Answering with External Tools [Paper] [GitHub]
- (arXiv 2023.09) Benchmarking Large Language Models in Retrieval-Augmented Generation [Paper] [GitHub]
- π₯ (ICLR'2024) WebArena: A Realistic Web Environment for Building Autonomous Agents [Paper] [GitHub] [Website]
- π (ICLR'2024) AgentBench: Evaluating LLMs as Agents [Paper] [Github] [Website]
- (arXiv 2023.10) Benchmarking Large Language Models As AI Research Agents [Paper] [Github]
- (arXiv 2023.12) T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step [Paper] [GitHub] [Website]
- (arXiv 2024.01) VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks [Paper] [GitHub] [Website]
- (arXiv 2024.03) DevBench: A Comprehensive Benchmark for Software Development [Paper] [GitHub]
- (arXiv 2024.04) AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent [Paper] [GitHub]
- (arXiv 2024.04) STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [Paper] [GitHub]
- π (NeurIPS'2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [Paper]
- π (ICLR'2023) ReAct: Synergizing Reasoning and Acting in Language Models [Paper] [GitHub] [Website]
- π₯ (arXiv 2023.05) ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models [Paper] [GitHub]
- π₯ (EMNLP'2023) Reasoning with Language Model is Planning with World Model [Paper] [GitHub]
- π (NeurIPS'2023) Tree of Thoughts: Deliberate Problem Solving with Large Language Models [Paper] [GitHub]
- π (NeurIPS'2023) Reflexion: Language Agents with Verbal Reinforcement Learning [Paper] [GitHub]
- π (NeurIPS'2023) Self-Refine: Iterative Refinement with Self-Feedback [Paper] [GitHub]
- (NeurIPS'2023) Self-Evaluation Guided Beam Search for Reasoning [Paper] [GitHub] [Website]
- π (arXiv 2023.08) Graph of Thoughts: Solving Elaborate Problems with Large Language Models [Paper] [GitHub]
- (ICLR'2024) Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph [Paper] [GitHub]
- (ICLR'2024) Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [Paper] [GitHub]
- (arXiv 2024.01) Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts [Paper]
- (arXiv 2024.01) Self-Rewarding Language Models [Paper]
- π (PMLR'2022) Improving language models by retrieving from trillions of tokens [Paper] [GitHub]
- (arXiv 2023.01) REPLUG: Retrieval-Augmented Black-Box Language Models [Paper]
- π₯ (EMNLP'2023) Active Retrieval Augmented Generation [Paper] [GitHub]
- (EMNLP'2023 findings) Self-Knowledge Guided Retrieval Augmentation for Large Language Models [Paper]
- π (ICLR'2024) DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines [Paper] [GitHub]
- (ICLR'2024) Retrieval meets Long Context Large Language Models [Paper]
- π₯ (ICLR'2024) Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [Paper] [GitHub] [Website]
- (NAACL'2024) REST: Retrieval-Based Speculative Decoding [Paper] [GitHub]
- (arXiv 2023.11) Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models [Paper]
- (arXiv 2024.02) G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [Paper] [GitHub]
- (arXiv 2024.03) RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation [Paper] [GitHub] [Website] [Demo]
- (arXiv 2024.03) RAFT: Adapting Language Model to Domain Specific RAG [Paper] [GitHub] [Website]
- (arXiv 2024.04) Introducing Super RAGs in Mistral 8x7B-v1 [Paper]
-
π₯ (CVPR'2023) Visual Programming: Compositional visual reasoning without training [Paper] [GitHub]
-
π (NeurIPS'2023) Toolformer: Language Models Can Teach Themselves to Use Tools [Paper] [GitHub]
-
π (arXiv 2023.05) Gorilla: Large Language Model Connected with Massive APIs [Paper] [GitHub] [Website]
-
(arXiv 2023.05) ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [Paper] [GitHub]
-
(arXiv 2023.06) ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases [Paper] [GitHub]
-
π (ICLR'2024) ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [Paper] [GitHub]
-
π (TMLR'2024) Voyager: An Open-Ended Embodied Agent with Large Language Models [Paper] [GitHub]
-
π₯ (arXiv 2023.10) AgentTuning: Enabling Generalized Agent Abilities for LLMs [Paper] [GitHub] [Website]
-
(arXiv 2023.10) FireAct: Toward Language Agent Fine-tuning [Paper] [GitHub] [Website]
-
(arXiv 2024.02) AUTOACT: Automatic Agent Learning from Scratch via Self-Planning [Paper] [GitHub] [Website]
-
(arXiv 2024.03) Agent Lumos: Unified and Modular Training for Open-Source Language Agents [Paper] [GitHub] [Website]
-
(arXiv 2024.03) Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [Paper] [GitHub] [Website]
- π (NeurIPS'2022) Training language models to follow instructions with human feedback [Paper] [GitHub]
- π (NeurIPS'2023) Direct Preference Optimization: Your Language Model is Secretly a Reward Model [Paper] [GitHub]
- (arXiv 2024.01) Self-Rewarding Language Models [Paper] [GitHub]
- (arXiv 2024.02) Noise Contrastive Alignment of Language Models with Explicit Rewards [Paper] [GitHub]
- π₯ (NeurIPS'2023) Mind2Web: Towards a Generalist Agent for the Web [Paper] [GitHub]
- (NeurIPS'2023 workshops) LASER: LLM Agent with State-Space Exploration for Web Navigation [Paper]
- (ICLR'2024) A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis [Paper] [GitHub]
- π₯ (arXiv 2024.01) GPT-4V(ision) is a Generalist Web Agent, if Grounded [Paper] [Github] [Website]
- (arXiv 2023.08) RecMind: Large Language Model Powered Agent For Recommendation [Paper]
- (arXiv 2023.10) On Generative Agents in Recommendation [paper] [GitHub]
- (arXiv 2023.10) AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems [Paper]
- π₯ (ICLR'2024) SWE-bench: Can Language Models Resolve Real-World Github Issues? [Paper] [GitHub] [Website]
- π (arXiv 2024.04) AutoCodeRover: Autonomous Program Improvement [Paper] [GitHub]
- (arXiv 2024.04) Can Language Models Solve Olympiad Programming? [Paper] [GitHub]
- (arXiv 2023.10) Can large language models provide useful feedback on research papers? A large-scale empirical analysis [Paper] [GitHub]
- (arXiv 2024.01) MARG: Multi-Agent Review Generation for Scientific Papers [Paper] [GitHub]
- (arXiv 2024.02) Reviewer2: Optimizing Review Generation Through Prompt Generation [Paper] [GitHub]
- (CHI'2024) A Design Space for Intelligent and Interactive Writing Assistants [Paper] [GitHub] [Website]
- (ICLR'2024) SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series [Paper] [GitHub]
- (ICLR'2024 workshops) FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design [Paper]
-
π (UIST'2023) Generative Agents: Interactive Simulacra of Human Behavior [Paper] [GitHub]
-
π (NeurIPS'2023) HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. [Paper] [GitHub]
-
π₯ (ICLR'2024) ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [Paper] [GitHub]
-
(arXiv 2023.04) Octopus v2: On-device language model for super agent [Paper]
-
(arXiv 2024.04) Empowering Biomedical Discovery with AI Agents [Paper]
Title | Link | Description |
---|---|---|
FastChat | lm-sys/FastChat | An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. |
π¦οΈπ LangChain | langchain-ai/langchain | π¦π Build context-aware reasoning applications |
ποΈ LlamaIndex π¦ | run-llama/llama_index | LlamaIndex is a data framework for your LLM applications |
LLaMA-Factory | hiyouga/Llama-Factory | Unify Efficient Fine-Tuning of 100+ LLMs |
PetalsπΈ | bigscience-workshop/petals | πΈ Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading |
Open-Assistant | LAION-AI/Open-Assistant | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. |
Title | Link | Description |
---|---|---|
CAMELπ« | camel-ai/camel | π« CAMEL: Communicative Agents for βMindβ Exploration of Large Language Model Society |
AutoGen | microsoft/autogen | A programming framework for agentic AI. |
π€ AgentVerseπͺ | OpenBMB/AgentVerse | π€ AgentVerse πͺ is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation |
Title | Link | Description |
---|---|---|
Chroma | chroma-core/chroma | the AI-native open-source embedding database |
Faiss | facebookresearch/faiss | A library for efficient similarity search and clustering of dense vectors. |