Batch-RL-Paper-Lists

Paper Collection for Batch RL with brief introductions.

Batch RL can be seen as an interesting specifc area in imitation learning yet it learn purely from demonstrations (batch data) and do not need to interact with the environment.

It is also knwon as offline RL.

Overview

TBA

Tutorial

<Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems> by Sergey Levine, Aviral Kumar, George Tucker, Justin Fu, 2020.
<Exponentially Weighted Imitation Learning for Batched Historical Data> by Sascha Lange, Thomas Gabel, Martin Riedmiller, 2012.

Early Work

[LSPI] <Least-squares policy iteration> by Michail G. Lagoudakis, Ronald Parr, 2003.
[FQI] <Tree-based batch mode reinforcement learning> by Damien Ernst, Pierre Geurts, Louis Wehenkel, 2005.
[NQI] <Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method> by Martin Riedmiller, 2005.

Try to imitate without interacting with environments

<Exponentially Weighted Imitation Learning for Batched Historical Data> by Qing Wang, Jiechao Xiong, Lei Han, Peng Sun, Han Liu and Tong Zhang, NIPS 2018.

General Batch RL

Model-free

[DQfD] <Deep Q-learning from Demonstrations> by Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys, 2017.
[NAC] <Reinforcement Learning from Imperfect Demonstrations> by Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell, ICML 2018.
[BEAR] <Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction> by Aviral Kumar, Justin Fu, George Tucker and Sergey Levine, NIPS 2019.
[DualDICE] <Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections> by Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li, ICML 2019.
[SPIBB] <Safe policy improvement with baseline bootstrapping> by Romain Laroche, Paul Trichelair, Remi Tachet des Combes, ICML 2019.
<Batch Policy Learning under Constraints> by Hoang M. Le, Cameron Voloshin, Yisong Yue, ICML 2019.
[BCQ] <Off-Policy Deep Reinforcement Learning without Exploration> by Scott Fujimoto, David Meger and Doina Precup, ICML 2019.
<Truly Batch Apprenticeship Learning with Deep Successor Features>, Donghun Lee, Srivatsan Srinivasan and Finale Doshi-Velez, IJCAI 2019.
[BCQ-Discrete] <Benchmarking Batch Deep Reinforcement Learning Algorithms> by Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau, 2019.
<On Value Discrepancy of Imitation Learning> by Tian Xu, Ziniu Li, Yang Yu, 2019.
[AWR] <Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning> by Xue Bin Peng, Aviral Kumar, Grace Zhang and Sergey Levine, 2019.
[BRAC] <Behavior Regularized Offline Reinforcement Learning> by Yifan Wu, George Tucker, Ofir Nachum, 2019.
[AlgaeDICE] <AlgaeDICE: Policy Gradient from Arbitrary Experience> by Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans, 2019.
[2IWIL] <Imitation Learning from Imperfect Demonstration> by Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama, 2019.
[ABM] <Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning> by Siegel et al., ICLR 2020.
[GenDICE] <GenDICE: Generalized Offline Estimation of Stationary Values> by Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans, ICLR 2020.
[GradientDICE] <GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values> by Shangtong Zhang, Bo Liu, Shimon Whiteson, ICML 2020.
[REM] <An Optimistic Perspective on Offline Reinforcement Learning> by Rishabh Agarwal, Dale Schuurmans and Mohammad Norouzi, ICML 2020.
[BOPAH] <Batch Reinforcement Learning with Hyperparameter Gradients> by Byung-Jun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim, ICML 2020.
[OFENet] <Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?> by Kei Ota, Tomoaki Oiki, Devesh K. Jha, Toshisada Mariyama, Daniel Nikovski, ICML 2020.
[PFQI] <Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning> by Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli, ICML 2020.
[PESC-TD(0)] <Reducing Sampling Error in Batch Temporal Difference Learning> by Brahma S. Pavse, Ishan Durugkar, Josiah P. Hanna, Peter Stone, ICML 2020.
<Provably Good Batch Reinforcement Learning Without Great Exploration> by Yao Liu, Adith Swaminathan, Alekh Agarwal and Emma Brunskill, NIPS 2020.
[ESRL] <Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation> by Aaron Sonabend-W, Junwei Lu, Leo A. Celi, Tianxi Cai, Peter Szolovits, NIPS 2020.
[MBML] <Multi-Task Batch Reinforcement Learning with Metric Learning> by Jiachen Li et al., NIPS 2020.
[BAIL] <BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning> by Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Keith Ross, NIPS 2020.
[EDM] <Strictly Batch Imitation Learning by Energy-based Distribution Matching> by Daniel Jarrett, Ioana Bica and Mihaela van der Schaar, NIPS 2020.
[AWAC] <Accelerating Online Reinforcement Learning with Offline Datasets> by Ashvin Nair, Murtaza Dalal, Abhishek Gupta, Sergey Levine, 2020.
[CQL] <Conservative Q-Learning for Offline Reinforcement Learning> by Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine, 2020.
[BREMEN] <Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization> by Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Shane Gu, 2020.
[UWAC] <Uncertainty Weighted Offline Reinforcement Learning> by Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M. Susskind, Jian Zhang, Ruslan Salakhutdinov, Hanlin Goh, 2020.
[CRR] <Critic Regularized Regression> by Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas, NIPS 2020.
[DAC-MDP] <DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs> by Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern, ICLR 2021.
[OPAL] <OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning> by Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum, ICLR 2021.
[O-RAAC] <Risk-Averse Offline Reinforcement Learning> by Nuria Armengol Urpi, Sebastian Curi, Andreas Krause, ICLR 2021.
<What are the Statistical Limits of Offline RL with Linear Function Approximation?> by Ruosong Wang, Dean P. Foster, Shan M. Kakade, ICLR 2021.

Model-based

[MOReL] <MOReL : Model-Based Offline Reinforcement Learning> by Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli and Thorsten Joachims, NIPS 2020.
[MOPO] <MOPO: Model-based Offline Policy Optimization> by Tianhe Yu et al., NIPS 2020.
[MOOSE] <Overcoming Model Bias for Robust Offline Deep Reinforcement Learning> by Phillip Swazinna, Steffen Udluft, Thomas Runkler, 2020.
<Model-Based Offline Planning> by Arthur Argenson, Gabriel Dulac-Arnold, ICLR 2021.
<Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization> by Michael R Zhang, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi, ICLR 2021.
[COMBO] <COMBO: Conservative Offline Model-Based Policy Optimization> by Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn, 2021.

Benchmark

[D4rl] <D4rl: Datasets for deep data-driven reinforcement learning> by Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine, 2020.
<Rl unplugged: Benchmarks for offline reinforcement learning> by Caglar Gulcehre, et al., 2020.
[NeoRL] <NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning> by Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, Yang Yu, 2021.

Applied Batch RL

<Scaling data-driven robotics with reward sketching and batch reinforcement learning> by Ajay Mandlekar, et al., 2019.
<Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog> by Natasha Jaques et al., 2019.
<Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data> by Ajay Mandlekar, et al., 2020.

SZH1230456/Batch-Offline--RL-Paper-Lists