Updated biweekly.
AI agents can think, act, and complete tasks by themselves.
But can they really replace our jobs?
π₯: Recommended papers
π: Survey papers
βοΈ: Benchmark papers
- Agent Capabilities
- AI Agents Architecture
- AI Agents Applications
- GenAI Agents Presentations
- "ACON: Optimizing Context Compression for Long-horizon LLM Agents" [paper]
- "Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models" [paper]
- "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory" [paper]
- "Learning on the Job: An Experience-Driven, Self-Evolving Agent for Long-Horizon Tasks" [paper]
- "Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents" [paper]
- "Where LLM Agents Fail and How They Can Learn From Failures" [paper]
- "AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning" [paper]
- "Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep Research" [paper]
- "Artificially intelligent agents in the social and behavioral sciences: A history and outlook" [paper]
- "Agentic Services Computing" [paper]
- "Donβt Just Fine-tune the Agent, Tune the Environment" [paper]
- "Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks" [paper]
- "LLM-REVal: Can We Trust LLM Reviewers Yet?" [paper]
- βοΈ "Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation" [paper]
- π "Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey" [paper]
- π "Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents" [paper]
- π "Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI" [paper]
- "LLM Agents Beyond Utility: An Open-Ended Perspective" [paper]
- "Deep Self-Evolving Reasoning" [paper]
- π "A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications" [paper]
- "BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?" [paper]
- "The Need for Verification in AI-Driven Scientific Discovery" [paper]
- "What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models" [paper]
- "LLM-empowered Agents Simulation Framework for Scenario Generation in Service Ecosystem Governance Social World Models" [paper]
- "Language Models Do Not Follow Occamβs Razor: A Benchmark for Inductive and Abductive Reasoning" [paper]
- "LLM-empowered Agents Simulation Framework for Scenario Generation in Service Ecosystem Governance" [paper]
- "VulAgent: A Hypothesis Validation-Based Multi-Agent System for Software Vulnerability Detection" [paper]
- "Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building" [paper]
- "Agents of Discovery" [paper]
- "ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization" [paper]
- "Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon Planning" [paper]
- "The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs" [paper]
- "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents" [paper]
- βοΈ "SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?" [paper]
- "Orchestrator: Active Inference for Multi-Agent Systems in Long-Horizon Tasks" [paper]
- βοΈ "LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering" [[paper]]
- "SWE-QA: Can Language Models Answer Repository-level Code Questions?" [paper]
- "ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory" [paper]
- "ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization" [paper]
- "rStar2-Agent: Agentic Reasoning Technical Report" [paper]
- "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey" [paper]
- "Scaling Agents via Continual Pre-training" [paper]
- "Online Process Reward Learning for Agentic Reinforcement Learning" [paper]
- "Tree Search for LLM Agent Reinforcement Learning" [paper]
- "LIMI: Less is More for Agency" [paper]
- "ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution" [paper]
- "Towards General Agentic Intelligence via Environment Scaling" [paper]
- "AgentΒ²: An Agent-Generates-Agent Framework for Reinforcement Learning Automation" [paper]
- "Self-Improving Embodied Foundation Models" [paper]
- "Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution" [paper]
- π "LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios" [paper]
- π "Reinforcement Learning Foundations for Deep Research Systems: A Survey" [paper]
- π "LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions" [paper]
- π "LLMs4All: A Review on Large Language Models for Research and Applications in Academic Disciplines" [paper]
- "Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance" [paper]
- π "A Comprehensive Survey of Self-Evolving AI Agents" [paper]
- "HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research" [paper]
- "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
- "SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents" [paper]
- "HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents" [paper]
- βοΈ "Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark" [paper]
- Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory [paper]
- "Memp: Exploring Agent Procedural Memory" [paper]
- "Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science" [paper]
- "Coarse-to-Fine Grounded Memory for LLM Agent Planning" [paper]
- "Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework" [paper]
- "Memento: Fine-tuning LLM Agents without Fine-tuning LLMs" [paper]
- "Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning" [paper]
- "K-Dense Analyst: Towards Fully Automated Scientific Analysis" [paper]
- π "From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery" [paper]
- "The AI Data Scientist" [paper]
- "Spacer: Towards Engineered Scientific Inspiration" [paper]
- "BIODISCO: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation" [paper]
- "Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization" [paper]
- "MK2 at PBIG Competition: A Prompt Generation Solution" [paper]
- "LLM Agents Are the Antidote to Walled Gardens", University of Oxford. [paper]
- "Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture", State Key Laboratory. [paper]
- "Aime: Towards Fully-Autonomous Multi-Agent Framework", ByteDance. [paper]
- π "A Survey of Context Engineering for Large Language Models" [paper]
- π "A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents" [paper]
- "From Reasoning to Super-Intelligence: A Search-Theoretic Perspective", AA-I. [paper]
- "Making REST APIs Agent-Ready: From OpenAPI to Model Context Protocol Servers for Tool-Augmented LLMs", University of Michigan. [paper]
- "Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications", Shandong University. [paper]
- "Emotionally Intelligent Task-oriented Dialogue Systems: Architecture, Representation, and Optimisation", Heinrich Heine University. [paper]
- "Agent Ideate: A Framework for Product Idea Generation from Patents Using Agentic AI", TCS Research. [paper]
- "Agent Exchange: Shaping the Future of AI Agent Economics", Shanghai Jiao Tong University. [paper]
- "Evaluating LLM Agent Collusion in Double Auctions", Relativity, Stanford University, Arb Research. [paper]
- "Enhancing COBOL Code Explanations: A Multi-Agents Approach Using Large Language Models", Queenβs University, IBM USA. [paper]
- "CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale", Duke University, Army Research Laboratory. [paper]
- "Deep Researcher with Test-Time Diffusion", Google. [paper]
- "AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents", Accenture. [paper]
- "Agentic Retrieval of Topics and Insights from Earnings Calls", Bloomberg. [paper]
- βοΈ "Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments", Fudan University. [paper]
- "Routine: A Structural Planning Framework for LLM Agent System in Enterprise", Digital China AI Research. [paper]
- "Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance", ByteDance. [paper]
- "Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments", Meta. [paper]
- "Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems", Tsinghua University. [paper]
- βοΈ "DABstep: Data Agent Benchmark for Multi-step Reasoning", Adyen, Hugging Face. [paper]
- π "Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence", Zhejiang University. [paper]
- π "The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist", University of North Texas. [paper]
- "AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench", Meta. [paper]
- "Open-ended Scientific Discovery via Bayesian Surprise", Allen Institute for AI. [paper]
- "Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery", WrocΕaw University. [paper]
- "Too Human to Model: The Uncanny Valley of LLMs in Social Simulation", Atmospheric Environmental Research. [paper]
- "Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust", CAMEL-AI.org. [paper]
- "LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra",Princeton University, Salesforce Research. [paper]
- π "Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle" [paper]
- "Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models", CIFAR AI Chair. [paper]
- "MemOS: A Memory OS for AI System", MemTensor (Shanghai) Technology Co., Ltd. [paper]
- βοΈ "Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions", UC San Diego. [paper]
- "MIRIX: Multi-Agent Memory System for LLM-Based Agents", MIRIX AI. [paper]
- βοΈ "DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents" [paper]
- π "From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents" [paper]
- π "Deep Research Agents: A Systematic Examination And Roadmap" [paper]
- π "Towards AI Search Paradigm" [paper]
- "Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge" [paper]
- "MMSearch-R1: Incentivizing LMMs to Search" [paper]
- "Towards Robust Fact-Checking: A Multi-Agent System with Advanced Evidence Retrieval" [paper]
- [Jun 2025] "AUTOMIND: Adaptive Knowledgeable Agent for Automated Data Science" [paper]
- π [Jun 2025] "Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents" [paper]
- [Jun 2025] "SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation" [paper]
- [Jun 2025] "SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications" [paper]
- [Jun 2025] "Towards Community-Driven Agents for Machine Learning Engineering" [paper]
- [Jun 2025] "MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement" [paper]
- "Oversight Structures for Agentic AI in Public-Sector Organizations" [paper]
- βοΈ "AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance" [paper]
- π "Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives" [paper]
- "Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era" [paper]
- "Improved LLM Agents for Financial Document Question Answering" [paper]
- βοΈ "ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering" [paper]
- "Decide less, communicate more: On the construct validity of end-to-end fact-checking in medicine" [paper]
- "SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models" [paper]
- βοΈ "SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents" [paper]
- "Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era" [paper]
- "Managing Complex Failure Analysis Workflows with LLM-based Reasoning and Acting Agents" [paper]
- "AgenticControl: An Automated Control Design Framework Using Large Language Models" [paper]
- π "A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools" [paper]
- π "A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law" [paper]
- π "Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models" [paper]
- "Table-R1: Inference-Time Scaling for Table Reasoning" [paper]
- "Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning" [paper]
- "Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning" [paper]
- "Agent RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving" [paper]
- "Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent" [paper]
- "An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents" [paper]
- "Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning" [paper]
- "MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning" [paper]
- "EvolveSearch: An Iterative Self-Evolving Search Agent" [paper]
- "VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection" [paper]
- "Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning" [paper]
- "RM-R1: Reward Modeling as Reasoning" [paper]
- "Reward Reasoning Model" [paper]
- "R3: Robust Rubric-Agnostic Reward Models" [paper]
- "AutoLibra: Agent Metric Induction from Open-Ended Feedback" [paper]
- "MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models (Short Version)" [paper]
- "MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents" [paper]
- "MARK: Memory Augmented Refinement of Knowledge" [paper]
- π "Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions" [paper]
- "Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs" [paper]
- "Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution" [paper]
- "Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution" [paper]
- "Absolute Zero: Reinforced Self-play Reasoning with Zero Data" [paper]
- "Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks" [paper]
- "DEBATE, TRAIN, EVOLVE: Self-Evolution of Language Model Reasoning" [paper]
- "Self Rewarding Self Improving" [paper]
- "EvolveSearch: An Iterative Self-Evolving Search Agent" [paper]
- "AlphaEvolve: A coding agent for scientific and algorithmic discovery" [paper]
- "Meta-Design Matters:A Self-Design Multi-Agent System" [paper]
- "Darwin GΓΆdel Machine:Open-Ended Evolution of Self-Improving Agents" [paper]
- "SEW: Self-Evolving Agentic Workflows for Automated Code Generation" [paper]
- "Multi-Agent Collaboration via Evolving Orchestration" [paper]
- π "Creativity in LLM-based Multi-Agent Systems: A Survey" [paper]
- βοΈ "Benchmarking LLMsβ Swarm intelligence" [paper]
- "Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems" [paper]
- "Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications" [paper]
- "Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study" [paper]
- "34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery" [paper]
- "PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration" [paper]
- "R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution" [paper]
- π "From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery" [paper]
- "Towards Artificial Intelligence Research Assistant for Expert-Involved Learning" [paper]
- "MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering" [paper]
- "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering" [paper]
- "Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics" [paper]
- "Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories" [paper]
- "JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation" [paper]
- "MLZero: A Multi-Agent System for End-to-end Machine Learning Automation" [paper]
- "Can Agents Fix Agent Issues?" [paper]
- "Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI" [paper]
- "The Real Barrier to LLM Agent Usability is Agentic ROI" [paper]
- π "A Survey on Large Language Model based Human-Agent Systems" [paper]
- π "Vision-Language-Action Models: Concepts, Progress, Applications and Challenges" [paper]
- π "Multi-agent Embodied AI: Advances and Future Directions" [paper]
- "Efficient Agent Training for Computer Use" [paper]
- βοΈ "AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios" [paper]
- "Inference-Time Scaling for Generalist Reward Modeling" [paper]
- "Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead"[paper]
- "Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection"[paper]
- "Dual Engines of Thoughts: A Depth-Breadth Integration Framework for Open-Ended Analysis"[paper]
- π "A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems"[paper]
- "Welcome to the Era of Experience" [paper]
- "SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills"[paper]
- "Exploring Expert Failures Improves LLM Agent Tuning" [paper]
- "Inducing Programmatic Skills for Agentic Tasks" [paper]
- "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" [paper]
- "Local Prompt Optimization" [paper]
- "Revisiting Prompt Optimization with Large Reasoning ModelsβA Case Study on Event Extraction" [paper]
- "Iterative Trajectory Exploration for Multimodal Agents" [papaer]
- "FlowReasoner: Reinforcing Query-Level Meta-Agents" [paper]
- "A Self-Improving Coding Agent" [paper]
- "Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models" [paper]
- "ToolRL: Reward is All Tool Learning Needs" [paper]
- "OTC: Optimal Tool Calls via Reinforcement Learning" [paper]
- "LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities" [paper]
- π "Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey" [paper]
- "The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search" [paper]
- "UFO2: The Desktop AgentOS" [paper]
- "AGENTADA: Skill-Adaptive Data Analytics for Tailored Insight Discovery"[paper]
- βοΈ "BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents" [paper]
- "Toward Super Agent System with Hybrid AI Router" [paper] "AgentA/B: Automated and Scalable Web A/B Testing with Interactive LLM Agents" [paper]
- [Apr 2025] "UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents" [paper]
- π "Challenges and Paths Towards AI for Software Engineering"[paper]
- π "Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems"[paper]
- π "Adaptive Human-Agent Teaming: A Review of Empirical Studies from the Process Dynamics Perspective" [paper]
- π "A Survey of AI Agent Protocols" [paper]
