This is a curated list of "Embodied AI or agent with Large Language Models" research which is maintained by haonan.
Watch this repository for the latest updates and feel free to raise pull requests if you find some interesting papers!
- Survey
- Advanced Agent Applications
- LLMs with RL or World Model
- Planning and Manipulation or Pretraining
- Multi-Agent Learning and Coordination
- Vision and Language Navigation
- Detection
- 3D Grounding
- Interactive Embodied Learning
- Rearrangement
- Benchmark
- Simulator
- Others
Figure 1. Trend of Embodied Agent with LLMs.[1] Figure 2. An envisioned Agent society.[2]
-
Agent AI: Surveying the Horizons of Multimodal Interaction [arXiv 2024]
Stanford University, Microsoft Research, Redmond, University of California, Los Angeles, University of Washington, Microsoft Gaming -
Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents [arXiv 2023]
Shanghai Jiao Tong University, Amazon Web Services, Yale University -
The Rise and Potential of Large Language Model Based Agents: A Survey [arXiv 2023]
Fudan NLP Group, miHoYo Inc -
A Survey on LLM-based Autonomous Agents [arXiv 2023]
Gaoling School of Artificial Intelligence, Renmin University of China
- AppAgent: Multimodal Agents as Smartphone Users [Project page] [Github]
Chi Zhang∗ ZhaoYang∗ JiaxuanLiu∗ YuchengHan XinChen Zebiao Huang BinFu GangYu†
Tencent
-
Eureka: Human-Level Reward Design via Coding Large Language Models [Project page] [Github]
Jason Ma1,2, William Liang2, Guanzhi Wang1,3, De-An Huang1, Osbert Bastani2, Dinesh Jayaraman2, Yuke Zhu1,4, Linxi "Jim" Fan1, Anima Anandkumar1
1NVIDIA; 2UPenn; 3Caltech; 4UT Austin -
RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds [arXiv 2023]
-
Can Language Agents Be Alternatives to PPO? A Preliminary Empirical Study on OpenAI Gym [arXiv 2023]
-
RoboGPT: An intelligent agent of making embodied long-term decisions for daily instruction tasks [arXiv 2023]
-
Aligning Agents like Large Language Models [arXiv 2023]
-
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents [ICLR 2024 spotlight]
-
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models [arXiv 2023]
-
Text2Reward: Dense Reward Generation with Language Models for Reinforcement Learning [ICLR 2024 spotlight]
-
Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning [arXiv 2023]
-
Online Continual Learning for Interactive Instruction Following Agents [ICLR 2024]
-
ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning [arXiv 2023]
-
Language Reward Modulation for Pretraining Reinforcement Learning [arXiv 2023]
-
Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes [arXiv 2023]
-
Learning to Model the World with Language [arXiv 2023]
-
MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning [ICLR 2024]
-
Language Reward Modulation for Pretraining Reinforcement Learning [arXiv 2023] [Github]
Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel
1UC Berkeley -
Guiding Pretraining in Reinforcement Learning with Large Language Models [ICML 2023]
Yuqing Du1*, Olivia Watkins1*, Zihan Wang2, Cedric Colas ´3,4, Trevor Darrell1, Pieter Abbeel1, Abhishek Gupta2, Jacob Andreas3
1Department of Electrical Engineering and Computer Science, University of California, Berkeley, USA 2University of Washington, Seattle 3Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory 4 Inria, Flowers Laboratory.
-
See and Think: Embodied Agent in Virtual Environment [arXiv 2023]
Zhonghan Zhao1*, Wenhao Chai2*, Xuan Wang1*, Li Boyi1, Shengyu Hao1, Shidong Cao1, Tian Ye3, Jenq-Neng Hwang2, Gaoang Wang1
1Zhejiang University 1University of Washington 1Hong Kong University of Science and Technology (GZ) -
Agent Instructs Large Language Models to be General Zero-Shot Reasoners [arXiv 2023]
Nicholas Crispino1, Kyle Montgomery1, Fankun Zeng1, Dawn Song2, Chenguang Wang1
1Washington University in St. Louis, 2UC Berkeley -
JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models [NeurIPS 2023] [Project Page]
Zihao Wang1,2 Shaofei Cai1,2 Anji Liu3 Yonggang Jin4 Jinbing Hou4 Bowei Zhang5 Haowei Lin1,2 Zhaofeng He4 Zilong Zheng6 Yaodong Yang1 Xiaojian Ma6† Yitao Liang1†
1Institute for Artificial Intelligence, Peking University, 2School of Intelligence Science and Technology, Peking University, 3Computer Science Department, University of California, Los Angeles, 4Beijing University of Posts and Telecommunications, 5School of Electronics Engineering and Computer Science, Peking University, 6Beijing Institute for General Artificial Intelligence (BIGAI) -
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents [NeurIPS 2023]
Zihao Wang1,2 Shaofei Cai1,2 Guanzhou Chen3 Anji Liu4 Xiaojian Ma4 Yitao Liang1,5†
1Institute for Artificial Intelligence, Peking University, 2School of Intelligence Science and Technology, Peking University, 3School of Computer Science, Beijing University of Posts and Telecommunications, 4Computer Science Department, University of California, Los Angeles, 5Beijing Institute for General Artificial Intelligence (BIGAI) -
CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society [NeurIPS 2023] [Github] [Project page]
Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem
1King Abdullah University of Science and Technology (KAUST) -
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents [arXiv 2022] [Github] [Project page]
Wenlong Huang1, Pieter Abbeel1, Deepak Pathak2, Igor Mordatch3
1UC Berkeley, 2Carnegie Mellon University, 3Google -
FILM: Following Instructions in Language with Modular Methods [ICLR 2022] [Github] [Project page]
So Yeon Min1, Devendra Singh Chaplot2, Pradeep Ravikumar1, Yonatan Bisk1, Ruslan Salakhutdinov1
1Carnegie Mellon University 2Facebook AI Research -
Embodied Task Planning with Large Language Models [arXiv 2023] [Github] [Project page] [Demo] [Huggingface Model]
Zhenyu Wu1, Ziwei Wang2,3, Xiuwei Xu2,3, Jiwen Lu2,3, Haibin Yan1*
1School of Automation, Beijing University of Posts and Telecommunications, 2Department of Automation, Tsinghua University, 3Beijing National Research Center for Information Science and Technology -
SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning [arXiv 2023]
Yue Wu1,4* , Shrimai Prabhumoye2 , So Yeon Min1 , Yonatan Bisk1 , Ruslan Salakhutdinov1 ,Amos Azaria3 , Tom Mitchell1 , Yuanzhi Li1,4
1Carnegie Mellon University, 2NVIDIA, 3Ariel University, 4Microsoft Research -
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning [CVPR 2022 (Oral)] [Project page] [Github]
Santhosh Kumar Ramakrishnan1,2, Devendra Singh Chaplot1, Ziad Al-Halah2 Jitendra Malik1,3, Kristen Grauman1,2
1Facebook AI Research, 2UT Austin, 3UC Berkeley -
Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics [ICLR 2023] [Project page] [Github]
Kuo-Hao Zeng1, Luca Weihs2, Roozbeh Mottaghi1, Ali Farhadi1
1Paul G. Allen School of Computer Science & Engineering, University of Washington, 2PRIOR @ Allen Institute for AI -
Modeling Dynamic Environments with Scene Graph Memory [ICML 2023]
Andrey Kurenkov1, Michael Lingelbach1, Tanmay Agarwal1, Emily Jin1, Chengshu Li1, Ruohan Zhang1, Li Fei-Fei1, Jiajun Wu1, Silvio Savarese2, Roberto Mart´ın-Mart´ın3
1Department of Computer Science, Stanford University 2Salesforce AI Research 3Department of Computer Science, University of Texas at Austin. -
Reasoning with Language Model is Planning with World Model [arXiv 2023]
Shibo Hao∗♣, Yi Gu∗♣, Haodi Ma♢, Joshua Jiahua Hong♣, Zhen Wang♣ ♠, Daisy Zhe Wang♢, Zhiting Hu♣
♣UC San Diego, ♢University of Florida, ♠Mohamed bin Zayed University of Artificial Intelligence -
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [arXiv 2022]
Robotics at Google, Everyday Robots -
Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling [ICML 2023]
Kolby Nottingham1 Prithviraj Ammanabrolu2 Alane Suhr2 Yejin Choi3,2 Hannaneh Hajishirzi3,2 Sameer Singh1,2 Roy Fox1
1Department of Computer Science, University of California Irvine 2Allen Institute for Artificial Intelligence 3Paul G. Allen School of Computer Science -
Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents [ICCV 2023]
Byeonghwi Kim Jinyeon Kim Yuyeong Kim1,* Cheolhong Min Jonghyun Choi†
Yonsei University 1Gwangju Institute of Science and Technology -
Inner Monologue: Embodied Reasoning through Planning with Language Models [CoRL 2022] [Project page]
Robotics at Google -
Language Models Meet World Models: Embodied Experiences Enhance Language Models [arXiv 2023] [Twitter]
Jiannan Xiang∗♠, Tianhua Tao∗♠, Yi Gu♠, Tianmin Shu♢, Zirui Wang♠, Zichao Yang♡, Zhiting Hu♠
♠UC San Diego, ♣UIUC, ♢MIT, ♡Carnegie Mellon University -
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation [arXiv 2023] [Video]
Chuhao Jin1* , Wenhui Tan1* , Jiange Yang2* , Bei Liu3† , Ruihua Song1 , Limin Wang2 , Jianlong Fu3†
1Renmin University of China, 2Nanjing University, 3Microsoft Research -
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution [CoRL 2021] [Project page] [Poster]
Valts Blukis1,2, Chris Paxton1, Dieter Fox1,3, Animesh Garg1,4, Yoav Artzi2
1NVIDIA 2Cornell University 3University of Washington 4University of Toronto, Vector Institute -
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models [ICCV 2023] [Project page] [Github]
Chan Hee Song1, Jiaman Wu1, Clayton Washington1, Brian M. Sadler2, Wei-Lun Chao1, Yu Su1
1The Ohio State University, 2DEVCOM ARL -
Code as Policies: Language Model Programs for Embodied Control [arXiv 2023] [Project page] [Github] [Blog] [Colab]
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng
Robotics at Google -
3D-LLM: Injecting the 3D World into Large Language Models [arXiv 2023]
1Yining Hong, 2Haoyu Zhen, 3Peihao Chen, 4Shuhong Zheng, 5Yilun Du, 6Zhenfang Chen, 6,7Chuang Gan
1UCLA 2 SJTU 3 SCUT 4 UIUC 5 MIT 6MIT-IBM Watson AI Lab 7 Umass Amherst -
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models [arXiv 2023] [Project page] [Online Demo]
Wenlong Huang1, Chen Wang1, Ruohan Zhang1, Yunzhu Li1,2, Jiajun Wu1, Li Fei-Fei1
1Stanford University 2University of Illinois Urbana-Champaign -
Palm-e: An embodied multimodal language mode [ICML 2023] [Project page]
1Robotics at Google 2TU Berlin 3Google Research -
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning [arXiv 2023]
Zirui Zhao Wee Sun Lee David Hsu
School of Computing National University of Singapore
-
War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars [arXiv 2023]
Wenyue Hua1*, Lizhou Fan2*, Lingyao Li2, Kai Mei1, Jianchao Ji1, Yingqiang Ge1, Libby Hemphill2, Yongfeng Zhang1
1Rutgers University, 2University of Michigan -
MindAgent: Emergent Gaming Interaction* [arXiv 2023]
Ran Gong*1† Qiuyuan Huang*2‡ Xiaojian Ma*1 Hoi Vo3 Zane Durante†4 Yusuke Noda3 Zilong Zheng5 Song-Chun Zhu15678 Demetri Terzopoulos1 Li Fei-Fei4 Jianfeng Gao2
1UCLA; 2Microsoft Research, Redmond; 3Xbox Team, Microsoft; 4Stanford; 5BIGAI; 6PKU; 7THU; 8UCLA -
Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum [ICML 2023]
Jigang Kim*1,2 Daesol Cho*1,2 H. Jin Kim1,3
1Seoul National University, 2Artificial Intelligence Institute of Seoul National University (AIIS), 3Automation and Systems Research Institute (ASRI).
Note: This paper mainly focuses on reinforcement learning for Embodied AI. -
Adaptive Coordination in Social Embodied Rearrangement [ICML 2023]
Andrew Szot1,2 Unnat Jain1 Dhruv Batra1,2 Zsolt Kira2 Ruta Desai1 Akshara Rai1
1Meta AI 2Georgia Institute of Technology.
-
IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience [arXiv 2023]
Joanne Truong1,2, April Zitkovich1, Sonia Chernova2, Dhruv Batra2,3, Tingnan Zhang1, Jie Tan1, Wenhao Yu1
1Robotics at Google 2Georgia Institute of Technology 3Meta AI -
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation [ICML 2023]
Kaiwen Zhou1, Kaizhi Zheng1, Connor Pryor1, Yilin Shen2, Hongxia Jin2, Lise Getoor1, Xin Eric Wang1
1University of California, Santa Cruz 2Samsung Research America. -
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models [arXiv 2023]
Gengze Zhou1 Yicong Hong2 Qi Wu1
1The University of Adelaide 2The Australian National University -
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model [arXiv 2023] [Github]
Siyuan Huang1,2 Zhengkai Jiang4 Hao Dong3 Yu Qiao2 Peng Gao2 Hongsheng Li5
1Shanghai Jiao Tong University, 2Shanghai AI Laboratory, 3CFCS, School of CS, PKU, 4University of Chinese Academy of Sciences, 5The Chinese University of Hong Kong
- DetGPT: Detect What You Need via Reasoning [arXiv 2023]
Renjie Pi1∗ Jiahui Gao2* Shizhe Diao1∗ Rui Pan1 Hanze Dong1 Jipeng Zhang1 Lewei Yao1 Jianhua Han3 Hang Xu2 Lingpeng Kong2 Tong Zhang1
1The Hong Kong University of Science and Technology 2The University of Hong Kong 3Shanghai Jiao Tong University
- LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent [arXiv 2023]
Jianing Yang1,, Xuweiyi Chen1,, Shengyi Qian1, Nikhil Madaan, Madhavan Iyengar1, David F. Fouhey1,2, Joyce Chai1
1University of Michigan, 2New York University
-
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning [ICML 2023]
Thomas Carta1*, Clement Romac ´1,2, Thomas Wolf2, Sylvain Lamprier3, Olivier Sigaud4, Pierre-Yves Oudeyer1
1Inria (Flowers), University of Bordeaux, 2Hugging Face, 3Univ Angers, LERIA, SFR MATHSTIC, F-49000, 4Sorbonne University, ISIR -
Learning Affordance Landscapes for Interaction Exploration in 3D Environments [NeurIPS 2020] [Project page]
Tushar Nagarajan, Kristen Grauman
UT Austin and Facebook AI Research, UT Austin and Facebook AI Research -
Embodied Question Answering in Photorealistic Environments with Point Cloud Perception [CVPR 2019 (oral)] [Slides]
Erik Wijmans1†, Samyak Datta1, Oleksandr Maksymets2†, Abhishek Das1, Georgia Gkioxari2, Stefan Lee1, Irfan Essa1, Devi Parikh1,2, Dhruv Batra1,2
1Georgia Institute of Technology, 2Facebook AI Research -
Multi-Target Embodied Question Answering [CVPR 2019]
Licheng Yu1, Xinlei Chen3, Georgia Gkioxari3, Mohit Bansal1, Tamara L. Berg1,3, Dhruv Batra2,3
1University of North Carolina at Chapel Hill 2Georgia Tech 3Facebook AI -
Neural Modular Control for Embodied Question Answering [CoRL 2018 (Spotlight)] [Project page] [Github]
Abhishek Das1,Georgia Gkioxari2, Stefan Lee1, Devi Parikh1,2, Dhruv Batra1,2
1Georgia Institute of Technology 2Facebook AI Research -
Embodied Question Answering [CVPR 2018 (oral)] [Project page] [Github]
Abhishek Das1, Samyak Datta1, Georgia Gkioxari22, Stefan Lee1, Devi Parikh2,1, Dhruv Batra2
1Georgia Institute of Technology, 2Facebook AI Research
- A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search [ICLR 2023]
1Brandon Trabucco, 2Gunnar A Sigurdsson, 2Robinson Piramuthu, 2,3Gaurav S. Sukhatme, 1Ruslan Salakhutdinov
1CMU, 2Amazon Alexa AI, 3University of Southern California
-
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [arXiv 2023] [Project page] [Github]
Yufei Wang1, Zhou Xian1, Feng Chen2, Tsun-Hsuan Wang3, Yian Wang4, Katerina Fragkiadaki1, Zackory Erickson1, David Held1, Chuang Gan4,5
1CMU, 2Tsinghua IIIS, 3MIT CSAIL, 4UMass Amherst, 5MIT-IBM AI Lab -
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning [ICLR 2021] [Project page] [Github]
Mohit Shridhar† Xingdi Yuan♡ Marc-Alexandre Côté♡ Yonatan Bisk‡ Adam Trischler♡ Matthew Hausknecht♣
‡University of Washington ♡Microsoft Research, Montréal ‡Carnegie Mellon University ♣Microsoft Research -
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks [CVPR 2020] [Project page] [Github]
Mohit Shridhar1 Jesse Thomason1 Daniel Gordon1 Yonatan Bisk1,2,3 Winson Han3 Roozbeh Mottaghi1,3 Luke Zettlemoyer1 Dieter Fox1,4
1Paul G. Allen School of Computer Sci. & Eng., Univ. of Washington, 2Language Technologies Institute @ Carnegie Mellon University, 3Allen Institute for AI, 4NVIDIA -
VIMA: Robot Manipulation with Multimodal Prompts [ICML 2023] [Project page] [Github] [VIMA-Bench]
Yunfan Jiang1 Agrim Gupta1† Zichen Zhang2† Guanzhi Wang3,4† Yongqiang Dou5 Yanjun Chen1 Li Fei-Fei1 Anima Anandkumar3,4 Yuke Zhu3,6‡ Linxi Fan3‡ -
SQA3D: Situated Question Answering in 3D Scenes [ICLR 2023] [Project page] [Slides] [Github]
Xiaojian Ma2 Silong Yong1,3* Zilong Zheng1 Qing Li1 Yitao Liang1,4 Song-Chun Zhu1,2,3,4 Siyuan Huang1
1Beijing Institute for General Artificial Intelligence (BIGAI) 2UCLA 3Tsinghua University 4Peking University -
IQA: Visual Question Answering in Interactive Environments [CVPR 2018] [Github] [Demo video (YouTube)]
Danie1 Gordon1 Aniruddha Kembhavi2 Mohammad Rastegari2,4 Joseph Redmon1 Dieter Fox1,3 Ali Farhadi1,2
1Paul G. Allen School of Computer Science, University of Washington 2Allen Institute for Artificial Intelligence 3Nvidia 4Xnor.ai -
Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments [ICCV 2021] [Project page] [Github]
Difei Gao1,2, Ruiping Wang1,2,3, Ziyi Bai1,2, Xilin Chen1,
1Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, 2University of Chinese Academy of Sciences, 3Beijing Academy of Artificial Intelligence
-
AI2-THOR: An Interactive 3D Environment for Visual AI [arXiv 2022] [Project page] [Github]
Allen Institute for AI, University of Washington, Stanford University, Carnegie Mellon University -
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [IROS 2021] [Project page] [Github]
Bokui Shen*, Fei Xia* et al. -
Habitat: A Platform for Embodied AI Research [ICCV 2019] [Project page] [Habitat-Sim] [Habitat-Lab] [Habitat Challenge]
Facebook AI Research, Facebook Reality Labs, Georgia Institute of Technology, Simon Fraser University, Intel Labs, UC Berkeley -
Habitat 2.0: Training Home Assistants to Rearrange their Habitat [NeurIPS 2021] [Project page]
Facebook AI Research, Georgia Tech, Intel Research, Simon Fraser University, UC Berkeley
-
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models [ICLR 2023]
Google Research, Brain Team -
React: Synergizing reasoning and acting in language models [ICLR 2023]
Shunyu Yao1∗, Jeffrey Zhao2, Dian Yu2, Nan Du2, Izhak Shafran2, Karthik Narasimhan1, Yuan Cao2
1Department of Computer Science, Princeton University 2, Google Research, Brain team -
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models [arXiv 2023]
Virginia Tech, Microsoft -
Graph of Thoughts: Solving Elaborate Problems with Large Language Models [arXiv 2023]
ETH Zurich, Cledar, Warsaw University of Technology -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models [arXiv 2023]
Shunyu Yao1, Dian Yu2, Jeffrey Zhao2, Izhak Shafran2, Thomas L. Griffiths1, Yuan Cao2, Karthik Narasimhan1
1Princeton University, 2Google DeepMind -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [NeurIPS 2022]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou
Google Research, Brain Team -
MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge [NeurIPS 2022] [Github] [Project page] [Knowledge Base]
Linxi Fan1 , Guanzhi Wang2∗ , Yunfan Jiang3* , Ajay Mandlekar1 , Yuncong Yang4 , Haoyi Zhu5 , Andrew Tang4 , De-An Huang1 , Yuke Zhu1,6† , Anima Anandkumar1,2†
1NVIDIA, 2Caltech, 3Stanford, 4Columbia, 5SJTU, 6UT Austin -
Distilling Internet-Scale Vision-Language Models into Embodied Agents [ICML 2023]
Theodore Sumers1∗ Kenneth Marino2 Arun Ahuja2 Rob Fergus2 Ishita Dasgupta2 -
LISA: Reasoning Segmentation via Large Language Model [arXiv 2023] [Github] [Huggingface Models] [Dataset] [Online Demo]
TXin Lai1 Zhuotao Tian2 Yukang Chen1 Yanwei Li1 Yuhui Yuan3 Shu Liu2 Jiaya Jia1,2
1The Chinese University of Hong Kong 2SmartMore 3MSRA
[1] Trend pic from this repo.
[2] Figure from this paper: The Rise and Potential of Large Language Model Based Agents: A Survey.