2023 up-to-date list of PAPERS, CODEBASES, and BENCHMARKS on Decision Making using Foundation Models including LLMs and VLMs.
Please feel free to send me pull requests or contact me to correct any mistakes.
- Survey of Foundation Models in Decision Making
- Foundation Models as World Models
- Foundation Models as Reward Models
- Foundation Models as Agent Models
- Foundation Models as Representation Encoders
- Multi-modal Decision Making Benchmarks
- "A survey of reinforcement learning informed by natural language." arXiv, 2019. [paper]
- "A Survey on Transformers in Reinforcement Learning." arXiv, 2023. [paper]
- "Foundation models for decision making: Problems, methods, and opportunities." arXiv, 2023. [paper]
- "A Survey of Large Language Models." arXiv, June 2023. [paper][code]
- IRIS: "Transformers are sample efficient world models." ICLR, 2023. [paper][code]
- UniPi: "Learning Universal Policies via Text-Guided Video Generation." arXiv, 2023.[paper][website]
- Dynalang: "Learning to Model the World with Language." arXiv, July 2023. [paper][website][code]
- EAGER: "EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL." NIPS, 2022. [paper][code]
- "Reward design with language models." ICLR, 2023. [paper][code]
- ELLM: "Guiding Pretraining in Reinforcement Learning with Large Language Models." arXiv, 2023. [paper]
- "Language to Rewards for Robotic Skill Synthesis." arXiv, June 2023. [paper][website]
-
Generative Agent
- FILM: "Film: Following instructions in language with modular methods." ICLR, 2022. [paper][code][website]
- "Grounding large language models in interactive environments with online reinforcement learning." arXiv, 2023. [paper][code]
- Inner Monologue: "Inner monologue: Embodied reasoning through planning with language models." arXiv, 2022. [paper][website]
- Plan4MC: "Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks." arXiv, 2023. [paper][code][website]
- ProgPrompt: "ProgPrompt: Generating Situated Robot Task Plans using Large Language Models." ICRA, 2023. [paper][website]
- Text2Motion: "Text2Motion: From Natural Language Instructions to Feasible Plans." arXiv, Mar 2023. [paper][website]
- Voyager: "Voyager: An Open-Ended Embodied Agent with Large Language Models." arXiv, May 2023. [paper][code][website]
- Reflexion: "Reflexion: Language Agents with Verbal Reinforcement Learning." arXiv, Mar 2023. [paper][code]
- ReAct: "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR, 2023. [paper][code][website]
- "Generative Agents: Interactive Simulacra of Human Behavior." arXiv, Apr 2023. [paper][code]
- "Cognitive Architectures for Language Agents." arXiv, Sep 2023. [paper][code]
-
Robotic-Specific
- SayCan: "Do as i can, not as i say: Grounding language in robotic affordances." arXiv, 2022. [paper][code][website]
- PaLM-E: "Palm-e: An embodied multimodal language model." arXiv, 2023. [paper][website]
- LM-Nav: "Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action." CoRL, 2022.[paper][code][website]
- ZSP: "Language models as zero-shot planners: Extracting actionable knowledge for embodied agents." ICML, 2022. [paper][code][website]
- DEPS: "Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents." arXiv, 2023. [paper][code]
- TidyBot: "TidyBot: Personalized Robot Assistance with Large Language Models." arXiv, 2023. [paper][website]
- Chatgpt for robotics: "Chatgpt for robotics: Design principles and model abilities." Microsoft Auton. Syst. Robot. Res 2 (2023): 20. [paper]
- KNOWNO: "Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners." arXiv, July 2023. [paper]
- VoxPoser: "VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models." July 2023. [[paper][website]
- RT-1: "RT-1: Robotics Transformer for Real-World Control at Scale." arXiv, Dec 2022. [paper][code]
- RT-2: "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." Deepmind, July 2023. [paper][website]
- MOO: "Open-World Object Manipulation using Pre-trained Vision-Language Models." arXiv, Mar 2023. [paper][website]
- EmbodiedGPT: "EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought." arXiv, May 2023. [paper][code][website]
- RoboCat: "RoboCat: A self-improving robotic agent." arXiv, Jun 2023. [paper][website]
- Cliport: "Cliport: What and where pathways for robotic manipulation." CoRL, 2021. [paper][code][website]
- Rt-1: "Rt-1: Robotics transformer for real-world control at scale." arXiv, 2022. [paper][code][website]
- Vima; "Vima: General robot manipulation with multimodal prompts." ICML, 2023. [paper][code][website]
- Perceiver-actor: "Perceiver-actor: A multi-task transformer for robotic manipulation." CoRL, 2022. [paper][code][website]
- InstructRL: "Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models." arXiv, 2022. [paper]
- Hiveformer: "Instruction-driven history-aware policies for robotic manipulations." CoRL, 2022. [paper][code][website]
- LID: "Pre-trained language models for interactive decision-making." NIPS, 2022. [paper][code][website]
- LISA: "LISA: Learning Interpretable Skill Abstractions from Language." NIPS, 2022. [paper][code]
- LoReL: "Learning language-conditioned robot behavior from offline data and crowd-sourced annotation." CoRL, 2021. [paper][code][website]
- GRIF: "Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control." arXiv, 2023. [paper][website]
- Meta-World: "Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning." CoRl, 2019. [paper][code][website]
- RLbench: James, Stephen, et al. "Rlbench: The robot learning benchmark & learning environment." IEEE Robotics and Automation Letters, 2020. [paper][code][website]
- VLMbench: Zheng, Kaizhi, et al. "Vlmbench: A compositional benchmark for vision-and-language manipulation." NIPS, 2022. [paper][code][website]
- Calvin: Mees, Oier, et al. "Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks." IEEE Robotics and Automation Letters, 2022. [paper][code][website]
- AI2-THOR "Ai2-thor: An interactive 3d environment for visual ai." arXiv, 2017. [paper][code][website]
- Alfred: "Alfred: A benchmark for interpreting grounded instructions for everyday tasks." CVPR, 2020. [paper][code][website]
- VirtualHome: "Watch-and-help: A challenge for social perception and human-ai collaboration." arXiv, 2020. [paper][code][website]
- Ravens: "Transporter networks: Rearranging the visual world for robotic manipulation." CoRL, 2020. [paper][code][website]
- Housekeep: "Housekeep: Tidying virtual households using commonsense reasoning." ECCV, 2022. [paper][code][website]
- Behavior-1k: "Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation." CoRL, 2022. [paper][code][website]
- Habitat 2.0: "Habitat 2.0: Training home assistants to rearrange their habitat." NIPS, 2021. [paper][code][website]
- Minedojo: "Minedojo: Building open-ended embodied agents with internet-scale knowledge." arXiv, 2022. [paper][code][website]
- BabyAI: "Babyai: A platform to study the sample efficiency of grounded language learning." ICLR, 2019. [paper][code]
- Generative Agents: "Generative Agents: Interactive Simulacra of Human Behavior." arXiv Apr 2023. [paper][website][code]
- AgentBench: "AgentBench: Evaluating LLMs as Agents." arXiv, Aug 2023. [paper][website][code]
- Toolformer: "Toolformer: Language Models Can Teach Themselves to Use Tools." arXiv, Feb 2023. [paper][code]