Awesome Model-Based Reinforcement Learning

This is a collection of research papers for model-based reinforcement learning (mbrl). And the repository will be continuously updated to track the frontier of model-based rl.

Welcome to follow and star!

[2023.02.05] New: We update the ICLR 2023 paper list of model-based rl!

[2022.11.03] We update the NeurIPS 2022 paper list of model-based rl.

[2022.07.06] We update the ICML 2022 paper list of model-based rl.

[2022.02.13] We update the ICLR 2022 paper list of model-based rl.

[2021.12.28] We release the awesome model-based rl.

A Taxonomy of Model-Based RL Algorithms
Papers
- Classic Model-Based RL Papers
- ICLR 2023(New!!!)
- NeurIPS 2022
- ICML 2022
- ICLR 2022
- NeurIPS 2021
- ICLR 2021
- ICML 2021
- Other
Contributing

A Taxonomy of Model-Based RL Algorithms

We’ll start this section with a disclaimer: it’s really quite hard to draw an accurate, all-encompassing taxonomy of algorithms in the Model-Based RL space, because the modularity of algorithms is not well-represented by a tree structure. So we will publish a series of related blogs to explain more Model-Based RL algorithms.

A non-exhaustive, but useful taxonomy of algorithms in modern Model-Based RL.

We simply divide Model-Based RL into two categories: Learn the Model and Given the Model.

Learn the Model mainly focuses on how to build the environment model.
Given the Model cares about how to utilize the learned model.

And we give some examples as shown in the figure above. There are links to algorithms in taxonomy.

[1] World Models: Ha and Schmidhuber, 2018
[2] I2A (Imagination-Augmented Agents): Weber et al, 2017
[3] MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
[4] MBVE (Model-Based Value Expansion): Feinberg et al, 2018
[5] ExIt (Expert Iteration): Anthony et al, 2017
[6] AlphaZero: Silver et al, 2017
[7] POPLIN (Model-Based Policy Planning): Wang et al, 2019
[8] M2AC (Masked Model-based Actor-Critic): Pan et al, 2020

Papers

format:
- [title](paper link) [links]
  - author1, author2, and author3
  - Key: key problems and insights
  - OpenReview: optional
  - ExpEnv: experiment environments

Classic Model-Based RL Papers

Dyna, an integrated architecture for learning, planning, and reacting
- Richard S. Sutton. ACM 1991
- Key: dyna architecture
- ExpEnv: None
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
- Marc Peter Deisenroth, Carl Edward Rasmussen. ICML 2011
- Key: probabilistic dynamics model
- ExpEnv: cart-pole system, robotic unicycle
Learning Complex Neural Network Policies with Trajectory Optimization
- Sergey Levine, Vladlen Koltun. ICML 2014
- Key: guided policy search
- ExpEnv: mujoco
Learning Continuous Control Policies by Stochastic Value Gradients
- Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez. NIPS 2015
- Key: backpropagation through paths, gradient on real trajectory
- ExpEnv: mujoco
Value Prediction Network
- Junhyuk Oh, Satinder Singh, Honglak Lee. NIPS 2017
- Key: value-prediction model
- ExpEnv: collect domain, atari
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
- Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee. NIPS 2018
- Key: ensemble model and Qnet, value expansion
- ExpEnv: mujoco, roboschool
Recurrent World Models Facilitate Policy Evolution
- David Ha, Jürgen Schmidhuber. NIPS 2018
- Key: vae(representation), rnn(predictive model)
- ExpEnv: car racing, vizdoom
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine. NIPS 2018
- Key: probabilistic ensembles with trajectory sampling
- ExpEnv: cartpole, mujoco
When to Trust Your Model: Model-Based Policy Optimization
- Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine. NeurIPS 2019
- Key: ensemble model, sac, k-branched rollout
- ExpEnv: mujoco
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
- Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma. ICLR 2019
- Key: Discrepancy Bounds Design, ME-TRPO with multi-step, Entropy regularization
- ExpEnv: mujoco
Model-Ensemble Trust-Region Policy Optimization
- Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel. ICLR 2018
- Key: ensemble model, TRPO
- ExpEnv: mujoco
Dream to Control: Learning Behaviors by Latent Imagination
- Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi. ICLR 2019
- Key: DreamerV1, latent space imagination
- ExpEnv: deepmind control suite, atari, deepmind lab
Exploring Model-based Planning with Policy Networks
- Tingwu Wang, Jimmy Ba. ICLR 2020
- Key: model-based policy planning in action space and parameter space
- ExpEnv: mujoco
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver. Nature 2020
- Key: MCTS, value equivalence
- ExpEnv: chess, shogi, go, atari

ICLR 2023

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
- Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner
- Key: model-based offline, bayesian posterior value estimate
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
User-Interactive Offline Reinforcement Learning
- Phillip Swazinna, Steffen Udluft, Thomas Runkler
- Key: let the user adapt the policy behavior after training is finished
- OpenReview: 10, 8, 6, 3
- ExpEnv: 2d-world, industrial benchmark
CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
- Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren, Junshan Zhang
- Key: offline IRL, reward extrapolation error
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
Efficient Offline Policy Optimization with a Learned Model
- Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
- Key: offline rl, analysis of MuZero Unplugged, one-step look-ahead policy improvement
- OpenReview: 8, 6, 5
- ExpEnv: atari dataset
Efficient Planning in a Compact Latent Action Space
- zhengyao jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian
- Key: planning with VQ-VAE
- OpenReview: 6, 6, 6, 6
- ExpEnv: d4rl dataset
Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function
- Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang
- Key: lipschitz regularization
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations
- Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran
- Key: three phases -- policy pretraining, targeted exploration, interactive learning
- OpenReview: 8, 6, 6, 6
- ExpEnv: adroit, meta-world, deepmind control suite
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
- Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov
- Key: Aligned Latent Models
- OpenReview: 8, 6, 6, 6, 6
- ExpEnv: mujoco

Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
- Daniel Palenicek, Michael Lutter, Joao Carvalho, Jan Peters
- Key: longer horizons yield diminishing returns in terms of sample efficiency
- OpenReview: 8, 6, 6, 6
- ExpEnv: brax
Planning Goals for Exploration
- Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman
- Key: sampling-based planning, set goals for each training episode to directly optimize an intrinsic exploration reward
- OpenReview: 8, 8, 8, 8, 6
- ExpEnv: point maze, walker, ant maze, 3-block stack
Making Better Decision by Directly Planning in Continuous Control
- Jinhua Zhu, Yue Wang, Lijun Wu, Tao Qin, Wengang Zhou, Tie-Yan Liu, Houqiang Li
- Key: deep differentiable dynamic programming planner
- OpenReview: 8, 8, 8, 6
- ExpEnv: mujoco
Latent Variable Representation for Reinforcement Learning
- Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, sujay sanghavi, Dale Schuurmans, Bo Dai
- Key: variational learning, representation learning
- OpenReview: 8, 6, 6, 3
- ExpEnv: mujoco, deepmind control suite
SpeedyZero: Mastering Atari with Limited Data and Time
- Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
- Key: distributed model-based rl, speed up EfficientZero
- OpenReview: 6, 6, 5
- ExpEnv: atari 100k
Transformer-based World Models Are Happy With 100k Interactions
- Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling
- Key: autoregressive world model, Transformer-XL, balanced cross-entropy loss, balanced dataset sampling
- OpenReview: 8, 6, 6, 6
- ExpEnv: atari 100k
On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
- Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
- Key: offline multi-task pretraining, online finetuning
- OpenReview: 6, 6, 6, 6
- ExpEnv: atari 100k
Become a Proficient Player with Limited Data through Watching Pure Videos
- Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao
- Key: unsupervised pre-training, finetune with down-stream tasks
- OpenReview: 8, 6, 6, 5
- ExpEnv: atari 100k
EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model
- Yifu Yuan, Jianye HAO, Fei Ni, Yao Mu, YAN ZHENG, Yujing Hu, Jinyi Liu, Yingfeng Chen, Changjie Fan
- Key: jointly pretrain the multi-headed dynamics model and unsupervised exploration policy, finetune to downstream tasks
- OpenReview: 6, 6, 6, 6
- ExpEnv: URLB benchmark

NeurIPS 2022

Bidirectional Learning for Offline Infinite-width Model-based Optimization
- Can Chen, Yingxue Zhang, Jie Fu, Xue Liu, Mark Coates
- Key: model-based, offline
- OpenReview: 7, 6, 5
- ExpEnv: design-bench
A Unified Framework for Alternating Offline Model Training and Policy Learning
- Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
- Key: model-based, offline, marginal importance weight
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
- Kaiyang Guo, Shao Yunfeng, Yanhui Geng
- Key: model-based, offline
- OpenReview: 8, 8, 7, 7
- ExpEnv: d4rl dataset
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
- Jiafei Lyu, Xiu Li, Zongqing Lu
- Key: double check mechanism, bidirectional modeling, offline RL
- OpenReview: 7, 6, 6
- ExpEnv: d4rl dataset
Model-Based Opponent Modeling
- XiaoPeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu
- Key: multi-agent, model-based
- OpenReview: 7, 6, 4, 3
- ExpEnv: mpe, google research football
Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning
- Zhiwei Xu, Dapeng Li, Bin Zhang, Yuan Zhan, Yunpeng Bai, Guoliang Fan
- Key: multi-agent, model-based
- OpenReview: 6, 5
- ExpEnv: StarCraft II, Google Research Football, Multi-Agent Discrete MuJoCo
MoCoDA: Model-based Counterfactual Data Augmentation
- Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg
- Key: data augmentation framework, offline RL
- OpenReview: 7, 7, 7, 6
- ExpEnv: 2D Navigation, Hook-Sweep
When to Update Your Model: Constrained Model-based Reinforcement Learning
- Tianying Ji, Yu Luo, Fuchun Sun, Mingxuan Jing, Fengxiang He, Wenbing Huang
- Key: event-triggered mechanism, constrained model-shift lower-bound optimization
- OpenReview: 6, 6, 5, 5
- ExpEnv: mujoco
Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
- Ashish Jayant, Shalabh Bhatnagar
- Key: constrained RL, model-based
- OpenReview: 7, 6, 5, 5
- ExpEnv: safety gym
Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework
- Henger Li, Xiaolin Sun, Zizhan Zheng
- Key: attack & defense, federated learning, model-based
- OpenReview: 6, 6, 6, 5
- ExpEnv: MNIST, FashionMNIST, EMNIST, CIFAR-10 and synthetic dataset
Model-Based Imitation Learning for Urban Driving
- Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zachary Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton
- Key: model-based, imitation learning, autonomous driving
- OpenReview: 7, 6, 6
- ExpEnv: CARLA
Data-Driven Model-Based Optimization via Invariant Representation Learning
- Han Qi, Yi Su, Aviral Kumar, Sergey Levine
- Key: domain adaptation, invariant objective models, representation learning (no about model-based RL)
- OpenReview: 7, 6, 6, 5, 5
- ExpEnv: design-bench
Model-based Lifelong Reinforcement Learning with Bayesian Exploration
- Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris
- Key: lifelong RL, variational bayesian
- OpenReview: 7, 6, 6
- ExpEnv: mujoco, meta-world
Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning
- Zifan Wu, Chao Yu, Chen Chen, Jianye Hao, Hankz Hankui Zhuo
- Key: treat the model rollout process as a sequential decision making problem
- OpenReview: 7, 7, 6, 6
- ExpEnv: mujoco, d4rl
Joint Model-Policy Optimization of a Lower Bound for Model-Based RL
- Benjamin Eysenbach, Alexander Khazatsky, Sergey Levine, Russ Salakhutdinov
- Key: unified objective for model-based RL
- OpenReview: 8, 8, 7, 6
- ExpEnv: gridworld, mujoco, ROBEL manipulation
RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
- Marc Rigter, Bruno Lacerda, Nick Hawes
- Key: offline rl, model-based rl, two-player game, adversarial model training
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl
Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
- Shenao Zhang
- Key: posterior sampling RL, referential update, constrained conservative update
- OpenReview: 7, 7, 5, 5
- ExpEnv: mujoco, N-Chain MDPs
Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning
- Chenyang Wu, Tianci Li, Zongzhang Zhang, Yang Yu
- Key: optimism in the face of uncertainty(OFU), BOO Regret
- OpenReview: 6, 6, 5
- ExpEnv: RiverSwim, Chain, Random MDPs
Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
- Alekh Agarwal, Tong Zhang
- Key: posterior sampling RL, Bellman error decoupling framework
- OpenReview: 7, 7, 7, 6
- ExpEnv: None
Exponential Family Model-Based Reinforcement Learning via Score Matching
- Gene Li, Junbo Li, Nathan Srebro, Zhaoran Wang, Zhuoran Yang
- Key: optimistic model-based, score matching
- OpenReview: 7, 7, 6
- ExpEnv: None
Deep Hierarchical Planning from Pixels
- Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
- Key: hierarchical RL, long-horizon and sparse reward tasks
- OpenReview: 6, 6, 5
- ExpEnv: atari, deepmind control suite, deepmind lab, crafter

ICML 2022

DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
- Fei Deng, Ingook Jang, Sungjin Ahn
- Key: dreamer, prototypes
- ExpEnv: deepmind control suite
Denoised MDPs: Learning World Models Better Than the World Itself
- Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
- Key: representation learning, denoised model
- ExpEnv: deepmind control suite, RoboDesk
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search
- Qi Wang, Herke van Hoof
- Key: graph structured surrogate model, meta training
- ExpEnv: atari, mujoco
Towards Adaptive Model-Based Reinforcement Learning
- Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen
- Key: local change adaptation
- ExpEnv: GridWorldLoCA, ReacherLoCA, MountaincarLoCA
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
- Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause
- Key: model-based multi-agent, confidence bound
- ExpEnv: SMART
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
- Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou
- Key: offline rl, model-based rl, stationary distribution regularization
- ExpEnv: d4rl
Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization
- Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine
- Key: benchmark, offline MBO
- ExpEnv: Design-Bench Benchmark Tasks
Temporal Difference Learning for Model Predictive Control
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: td-learning, MPC
- ExpEnv: deepmind control suite, Meta-World

ICLR 2022

Revisiting Design Choices in Offline Model Based Reinforcement Learning
- Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, Stephen J. Roberts
- Key: model-based offline, uncertainty quantification
- OpenReview: 8, 8, 6, 6, 6
- ExpEnv: d4rl dataset
Value Gradient weighted Model-Based Reinforcement Learning
- Claas A Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand
- Key: Value-Gradient weighted Model loss
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
Planning in Stochastic Environments with a Learned Model
- Ioannis Antonoglou, Julian Schrittwieser, Sherjil Ozair, Thomas K Hubert, David Silver
- Key: MCTS, stochastic MuZero
- OpenReview: 10, 8, 8, 5
- ExpEnv: 2048 game, Backgammon, Go
Policy improvement by planning with Gumbel
- Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver
- Key: Gumbel AlphaZero, Gumbel MuZero
- OpenReview: 8, 8, 8, 6
- ExpEnv: go, chess, atari
Model-Based Offline Meta-Reinforcement Learning with Regularization
- Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang
- Key: model-based offline Meta-RL
- OpenReview: 8, 6, 6, 6
- ExpEnv: d4rl dataset
On-Policy Model Errors in Reinforcement Learning
- Lukas Froehlich, Maksym Lefarov, Melanie Zeilinger, Felix Berkenkamp
- Key: model errors, on-policy corrections
- OpenReview: 8, 6, 6, 5
- ExpEnv: mujoco, pybullet
A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning
- Jiaxian Guo, Mingming Gong, Dacheng Tao
- Key: relational intervention, dynamics generalization
- OpenReview: 8, 8, 6, 6
- ExpEnv: Pendulum, mujoco
Information Prioritization through Empowerment in Visual Model-based RL
- Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine
- Key: mutual information, visual model-based RL
- OpenReview: 8, 8, 8, 6
- ExpEnv: deepmind control suite, Kinetics dataset
Transfer RL across Observation Feature Spaces via Model-Based Regularization
- Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew E Cohen, Furong Huang
- Key: latent dynamics model, transfer RL
- OpenReview: 8, 6, 5, 5
- ExpEnv: CartPole, Acrobot and Cheetah-Run, mujoco, 3DBall
Learning State Representations via Retracing in Reinforcement Learning
- Changmin Yu, Dong Li, Jianye HAO, Jun Wang, Neil Burgess
- Key: representation learning, learning via retracing
- OpenReview: 8, 6, 5, 3
- ExpEnv: deepmind control suite
Model-augmented Prioritized Experience Replay
- Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
- Key: prioritized experience replay, mbrl
- OpenReview: 8, 8, 6, 5
- ExpEnv: pybullet
Evaluating Model-Based Planning and Planner Amortization for Continuous Control
- Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller
- Key: model predictive control
- OpenReview: 8, 6, 6, 6
- ExpEnv: mujoco
Gradient Information Matters in Policy Optimization by Back-propagating through Model
- Chongchong Li, Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu
- Key: two-model-based method, analyze model error and policy gradient
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
Pareto Policy Pool for Model-based Offline Reinforcement Learning
- Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi
- Key: model-based offline, model return-uncertainty trade-off
- OpenReview: 8, 8, 6, 5
- ExpEnv: d4rl dataset
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
- Masatoshi Uehara, Wen Sun
- Key: model-based offline theory, PAC bounds
- OpenReview: 8, 6, 6, 5
- ExpEnv: None
Know Thyself: Transferable Visual Control Policies Through Robot-Awareness
- Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman
- Key: world models that transfer to new robots
- OpenReview: 8, 6, 6, 5
- ExpEnv: mujoco, WidowX and Franka Panda robot

NeurIPS 2021

On Effective Scheduling of Model-based Reinforcement Learning
- Hang Lai, Jian Shen, Weinan Zhang, Yimin Huang, Xing Zhang, Ruiming Tang, Yong Yu, Zhenguo Li
- Key: extension of mbpo, hyper-controller learning
- OpenReview: 8, 6, 6
- ExpEnv: mujoco, pybullet
Safe Reinforcement Learning by Imagining the Near Future
- Garrett Thomas, Yuping Luo, Tengyu Ma
- Key: safe rl, reward penalty, theory about model-based rollouts
- OpenReview: 8, 6, 6
- ExpEnv: mujoco
Model-Based Reinforcement Learning via Imagination with Derived Memory
- Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Eben Li, Chongjie Zhang, Jianye HAO
- Key: extension of dreamer, prediction-reliability weight
- OpenReview: 6, 6, 6, 6
- ExpEnv: deepmind control suite
MobILE: Model-Based Imitation Learning From Observation Alone
- Rahul Kidambi, Jonathan Chang, Wen Sun
- Key: imitation learning from observations alone, mbrl
- OpenReview: 6, 6, 6, 4
- ExpEnv: cartpole, mujoco
Model-Based Episodic Memory Induces Dynamic Hybrid Controls
- Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh
- Key: model-based, episodic control
- OpenReview: 7, 7, 6, 6
- ExpEnv: 2D maze navigation, cartpole, mountainCar and lunarlander, atari, 3D navigation: gym-miniworld
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
- Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
- Key: mbrl, set representation
- OpenReview: 7, 7, 7, 6
- ExpEnv: MiniGrid-BabyAI framework
Mastering Atari Games with Limited Data
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
- Key: muzero, self-supervised consistency loss
- OpenReview: 7, 7, 7, 5
- ExpEnv: atrai 100k, deepmind control suite
Online and Offline Reinforcement Learning by Planning with a Learned Model
- Julian Schrittwieser, Thomas K Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver
- Key: muzero, reanalyse, offline
- OpenReview: 8, 8, 7, 6
- ExpEnv: atrai dataset, deepmind control suite dataset
Self-Consistent Models and Values
- Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
- Key: new model learning way
- OpenReview: 7, 7, 7, 6
- ExpEnv: tabular MDP, Sokoban, atari
Proper Value Equivalence
- Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, Satinder Singh
- Key: value equivalence, value-based planning, muzero
- OpenReview: 8, 7, 7, 6
- ExpEnv: four rooms, atari
MOPO: Model-based Offline Policy Optimization
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
- Key: model-based, offline
- OpenReview: None
- ExpEnv: d4rl dataset, halfcheetah-jump and ant-angle
RoMA: Robust Model Adaptation for Offline Model-based Optimization
- Sihyun Yu, Sungsoo Ahn, Le Song, Jinwoo Shin
- Key: model-based, offline
- OpenReview: 7, 6, 6
- ExpEnv: design-bench
Offline Reinforcement Learning with Reverse Model-based Imagination
- Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
- Key: model-based, offline
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
Offline Model-based Adaptable Policy Learning
- Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye
- Key: model-based, offline
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl dataset
Weighted model estimation for offline model-based reinforcement learning
- Toru Hishinuma, Kei Senda
- Key: model-based, offline, off-policy evaluation
- OpenReview: 7, 6, 6, 6
- ExpEnv: pendulum, d4rl dataset
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
- Weitong Zhang, Dongruo Zhou, Quanquan Gu
- Key: learning theory, model-based reward-free RL, linear function approximation
- OpenReview: 6, 6, 5, 5
- ExpEnv: None
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
- Kefan Dong, Jiaqi Yang, Tengyu Ma
- Key: learning theory, model-based bandit RL, nonlinear function approximation
- OpenReview: 7, 7, 7, 6
- ExpEnv: None
Discovering and Achieving Goals via World Models
- Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak
- Key: unsupervised goal reaching, goal-conditioned RL
- OpenReview: 6, 6, 6, 6, 6
- ExpEnv: walker, quadruped, bins, kitchen

ICLR 2021

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
- Key: model-based, behavior cloning (warmup), trpo
- OpenReview: 8, 7, 7, 5
- ExpEnv: d4rl dataset
Control-Aware Representations for Model-based Reinforcement Learning
- Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
- Key: representation learning, model-based soft actor-critic
- OpenReview: 6, 6, 6
- ExpEnv: planar system, inverted pendulum – swingup, cartpole, 3-link manipulator — swingUp & balance
Mastering Atari with Discrete World Models
- Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba
- Key: DreamerV2, many tricks(multiple categorical variables, KL balancing, etc)
- OpenReview: 9, 8, 5, 4
- ExpEnv: atari
Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
- Key: goal-reaching task, dynamics learning, distance learning (goal-conditioned Q-function)
- OpenReview: 7, 7, 7, 7
- ExpEnv: sawyer, door sliding
Model-Based Offline Planning
- Arthur Argenson, Gabriel Dulac-Arnold
- Key: model-based, offline
- OpenReview: 8, 7, 5, 5
- ExpEnv: RL Unplugged(RLU), d4rl dataset
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation
- Justin Fu, Sergey Levine
- Key: model-based, offline
- OpenReview: 8, 6, 6
- ExpEnv: design-bench
On the role of planning in model-based deep reinforcement learning
- Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber
- Key: discussion about planning in MuZero
- OpenReview: 7, 7, 6, 5
- ExpEnv: atari, go, deepmind control suite
Representation Balancing Offline Model-based Reinforcement Learning
- Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim
- Key: Representation Balancing MDP, model-based, offline
- OpenReview: 7, 7, 7, 6
- ExpEnv: d4rl dataset
Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?
- Balázs Kégl, Gabriel Hurtado, Albert Thomas
- Key: mixture density nets, heteroscedasticity
- OpenReview: 7, 7, 7, 6, 5
- ExpEnv: acrobot system

ICML 2021

Conservative Objective Models for Effective Offline Model-Based Optimization
- Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
- Key: conservative objective model, offline mbrl
- ExpEnv: design-bench
Continuous-Time Model-Based Reinforcement Learning
- Çağatay Yıldız, Markus Heinonen, Harri Lähdesmäki
- Key: continuous-time
- ExpEnv: pendulum, cartPole and acrobot
Model-Based Reinforcement Learning via Latent-Space Collocation
- Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
- Key: latent space collocation
- ExpEnv: sparse metaworld tasks
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- David A Bruns-Smith
- Key: worst-case bounds
- ExpEnv: ope-tools
Muesli: Combining Improvements in Policy Optimization
- Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt
- Key: value equivalence
- ExpEnv: atari
Vector Quantized Models for Planning
- Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals
- Key: VQVAE, MCTS
- ExpEnv: chess datasets, DeepMind Lab
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
- Yuda Song, Wen Sun
- Key: sample complexity, kernelized nonlinear regulators, linear MDPs
- ExpEnv: mountain car, antmaze, mujoco
Temporal Predictive Coding For Model-Based Planning In Latent Space
- Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon
- Key: temporal predictive coding with a RSSM, latent space
- ExpEnv: deepmind control suite
Model-based Reinforcement Learning for Continuous Control with Posterior Sampling
- Ying Fan, Yifei Ming
- Key: regret bound of psrl, mpc
- ExpEnv: continuous cartpole, pendulum swingup, mujoco
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
- Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
- Key: learning theory, multi-agent, model-based self play, two-player zero-sum Markov games
- ExpEnv: None

Other

Mastering Diverse Domains through World Models
- Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap. Arxiv 2023
- Key: DreamerV3, scaling property to world model
- ExpEnv: deepmind control suite, atari, DMLab, minecraft

Contributing

Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Awesome Model-Based RL is released under the Apache 2.0 license.

philip-ndikum/awesome-model-based-RL