/EnvInteractiveLMPapers

Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW(🏄).

MIT LicenseMIT

EnvInteractive and Decision Making(personnal usage)

Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW(🏄).

Thesis

  1. Grounding natural language with autonomous interaction, Karthik Rajagopal Narasimhan, [pdf], 2017
  2. Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog, Jesse David Thomason, [pdf], 2018
  3. Using Natural Language to Aid Task Specification in Sequential Decision Making Problems, Prasoon Goyal, [pdf], 2022

Papers

  1. World of Bits: An Open-Domain Platform for Web-Based Agents Arxiv.

    Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang [pdf] 2017

  2. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning Arxiv.

    Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht [pdf] 2020.10

  3. WebGPT: Browser-assisted question-answering with human feedback Arxiv.

    Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman [pdf] 2021.12

  4. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents Arxiv.

    Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch [pdf] 2022.1

  5. FLIN: A Flexible Natural Language Interface for Web Navigation Arxiv.

    Sahisnu Mazumder, Oriana Riva [pdf] 2022.2

  6. Improving Intrinsic Exploration with Language Abstractions. Arxiv.

    Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette [pdf] 2022.2

  7. Pre-Trained Language Models for Interactive Decision-Making. Arxiv.

    Shuang Li, Xavier Puig, Yilun Du, Clinton Wang, Ekin Akyurek, Antonio Torralba, Jacob Andreas, Igor Mordatch [pdf] 2022.2

  8. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. Arxiv.

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan [pdf] 2022.4

  9. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Arxiv.

    Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence [pdf] 2022.5

  10. Few-shot Subgoal Planning with Language Models. Arxiv.

    Lajanugen Logeswaran, Yao Fu, Moontae Lee, Honglak Lee [pdf] 2022.5

  11. A Generalist Agent. Arxiv.

    Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas [pdf] 2022.5

  12. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos. Arxiv.

    Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune [pdf] 2022.6

  13. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. Arxiv.

    Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar [pdf] 2022.6

  14. Inner Monologue: Embodied Reasoning through Planning with Language Models Arxiv.

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter [pdf] 2022.7

  15. MLM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action. Arxiv.

    Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine [pdf] 2022.7

  16. JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents Arxiv.

    Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang [pdf] 2022.8

  17. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. Arxiv.

    Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, Animesh Garg [pdf] 2022.9

  18. On Grounded Planning for Embodied Tasks with Language Models. Arxiv.

    Bill Yuchen Lin, Chengsong Huang, Qian Liu, Wenda Gu, Sam Sommerer, Xiang Ren [pdf] 2022.9

  19. Code as Policies: Language Model Programs for Embodied Control. Arxiv.

    Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng [pdf] 2022.9

  20. Open-vocabulary Queryable Scene Representations for Real World Planning. Arxiv.

    Boyuan Chen, Fei Xia, Brian Ichter, Kanishka Rao, Keerthana Gopalakrishnan, Michael S. Ryoo, Austin Stone, Daniel Kappler [pdf] 2022.9

  21. ReAct: Synergizing Reasoning and Acting in Language Models. Arxiv.

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao [pdf] 2022.10

  22. Mind's Eye: Grounded Language Model Reasoning through Simulation. Arxiv.

    Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai [pdf] 2022.10

  23. Interactive Language: Talking to Robots in Real Time. Arxiv.

    Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, Pete Florence [pdf] 2022.10

  24. Planning with Large Language Models via Corrective Re-prompting. Arxiv.

    Shreyas Sundara Raman, Vanya Cohen, Eric Rosen, Ifrah Idrees, David Paulius, Stefanie Tellex [pdf] 2022.11

  25. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. Arxiv.

    Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su [pdf] 2022.12

  26. Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments. Arxiv.

    Yu Gu, Xiang Deng, Yu Su [pdf] 2022.12

  27. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents. Arxiv.

    Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang [pdf] 2023.2

  28. Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning. Arxiv.

    Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer [pdf] 2023.2

  29. Collaborating with language models for embodied reasoning. Arxiv.

    Ishita Dasgupta, Christine Kaeser-Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, Rob Fergus [pdf] 2023.2

  30. Guiding Pretraining in Reinforcement Learning with Large Language Models. Arxiv.

    Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas [pdf] 2023.2

  31. Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals. Arxiv.

    Yue Wu, Yewen Fan, Paul Pu Liang, Amos Azaria, Yuanzhi Li, Tom M. Mitchell [pdf] 2023.2

  32. Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control. Arxiv.

    Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, Brian Ichter [pdf] 2023.3

  33. PaLM-E: An Embodied Multimodal Language Model. Arxiv.

    Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence [pdf] 2023.3

  34. Foundation Models for Decision Making: Problems, Methods, and Opportunities. Arxiv.

    Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans [pdf] 2023.3

  35. GPT-4

[THE END]?

  1. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Arxiv.

    Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan [pdf] 2023.3

  2. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace Arxiv.

    Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang [pdf] 2023.3

  3. AutoGPT GitHub.

    https://github.com/Significant-Gravitas/Auto-GPT/graphs/contributors [code] 2023.3

  4. API-Bank: A Benchmark for Tool-Augmented LLMs. Arxiv.

    Minghao Li, Feifan Song, Bowen Yu, Haiyang Yu, Zhoujun Li, Fei Huang, Yongbin Li [pdf] 2023.4

  5. Tool Learning with Foundation Models. Arxiv.

    Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun [pdf] 2023.4

  6. LLM as A Robotic Brain: Unifying Egocentric Memory and Control. Arxiv.

    Jinjie Mai, Jun Chen, Bing Li, Guocheng Qian, Mohamed Elhoseiny, Bernard Ghanem [pdf] 2023.4

  7. Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents. Arxiv.

    Yue Wu, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Yuanzhi Li, Tom Mitchell, Shrimai Prabhumoye [pdf] 2023.5

  8. SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning. Arxiv.

    Yue Wu, Shrimai Prabhumoye, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Tom Mitchell, Yuanzhi Li [pdf] 2023.5

  9. Mind2Web: Towards a Generalist Agent for the Web. Arxiv.

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, Yu Su [pdf] 2023.6

  10. A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis. Arxiv.

    Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust [pdf] 2023.7

  11. WebArena: A Realistic Web Environment for Building Autonomous Agents Arxiv

    Shuyan Zhou, Frank F. Xu, Hao Zh+, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig [pdf], 2023.7