Related papers for offline reforcement learning (we mainly focus on representation and sequence modeling and conventional offline RL)
Anyone is welcomed to submit a pull request for the related and unlisted papers on offline RL, which are published on peer-review conferences (ICML/NeurIPS/ICLR/CVPR etc.) or released on arXiv.
-
Provable Representation Learning for Imitation with Contrastive Fourier Features
提出用contrastive learning方法从offline dataset表征state,可以证明在使用该方法优化表征state的函数和强化学习过程时,等价于将学习的policy与收集offline dataset的target policy的距离的upper bound降低,不需要考虑target policy表征的形式,理论上可以将error降到0。相较Behavior Cloning方法(Bisimulation-based)性能大幅提升,实验在tabular case和Atari2600上训练的DQN均做了实验。(learning dynamic and reward)BC对target policy采集数据的形式有限制,且由于数据未全覆盖且空间大sample efficiency比较低
-
Pretraining Representations for Data-Efficient Reinforcement Learning
-
Representation Matters: Offline Pretraining for Sequential Decision Making
- Reinforcement Learning as One Big Sequence Modeling Problem
- Decision Transformer: Reinforcement Learning via Sequence Modeling[code]
-
Conservative Q-Learning for Offline Reinforcement Learning [code]
CQL:提出在offline-rl中Q learning过程中加入正则项来缓解over-estimate的问题。通过正则项可以学习到真实Q的lower bound,这样扩大估计值和真实值Q的距离,缓解offline RL中存在的distribution shift问题
-
Off-policy deep reinforcement learning without exploration
BCQ: constrains the mismatch between the state-action visitation of the policy and the state-action pairs contained in the batch by using a state-conditioned generative model to produce only previously seen actions.
-
D4RL: Datasets for Deep Data-Driven Reinforcement Learning) [code]
-
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction [code]
BEAR:为环境offline RL设定中从静态数据集中学习存在的数据偏移问题,提出在Bellman方程迭代更新Q过程中,由于action distribution shift导致的累积误差越来越大可能会导致最终不收敛的现象。文章提出了如何限制动作的选择来减轻此问题,提出算法bootstrapping error accumulation reduction (BEAR),借助support-set matching来避免此问题
-
Offline Decentralized Multi-Agent Reinforcement Learning
MABCQ: 通过对conditional VAE模型的隐向量的约束,实现对Q learning网络从offline学习到online的约束,方法在MA-Mujoco上验证
-
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
ICQ: 本质上依然是对Q-network做约束,加上mixing网络为ICQ-Ma作为offline MARL的方法
-
Boosting Offline Reinforcement Learning with Residual Generative Modeling
同样使用conditional VAE来约束offline learning过程中学到的policy
-
Behavior Constraining in Weight Space for Offline Reinforcement Learning
-
Offline Reinforcement Learning with Fisher Divergence Critic Regularization
-
Offline Meta-Reinforcement Learning with Online Self-Supervision
-
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
-
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
-
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning in Robotics
-
Weighted model estimation for offline model-based reinforcement learning