/Decision-Transformer-on-Offline-Reinforcement-Learning

Implementation of Decision Transformer, Conservative Q-Learning, and Behavior Cloning in Offline Reinforcement Learning setting

Primary LanguageJupyter Notebook

Decision Transformer on Offline Reinforcement Learning

This repository includes Decision Transformer, Conservative Q-Learning, and Behavior cloning implementations on OpenAI Gym's MuJoCo environments. Instead of training a policy through conventional Reinforcement Learning Algorithms, models are trained on collected experiences recorded by an expert player based on Offline Reinforcement Learning setting. In Offline RL, unlike the traditional RL in which data are obtained via interactions with environments, the policy has only access to a limited dataset consisting of trajectory rollouts. This setting restricts the agent's ability by preventing them from exploring the environment and obtaining feedback; however, the Offline RL approaches depicted successful results despite those difficulties.

Requirements

Contents

  • Behavior Cloning (BC)
  • Decision Transformer
  • GPT-2
  • Soft Actor-Critic (SAC)
  • Conseravtive Q-Learning (CQL)

Experiments

HalfCheetah-v3 Environment

Behavior Cloning

HC-BC.mp4

Conservative Q-Learning

Trained the model for 1M iterations and for each iteration a batch of 256 rollouts were sampled.

HC-CQL

HC-CQL.mp4

Decision Transformer

HC-DT.mp4

Hopper-v2 Environment

Behavior Cloning

H-BC.mp4

Conservative Q-Learning

Trained the model for 1M iterations and for each iteration a batch of 256 rollouts were sampled.

H-CQL

H-CQL.mp4

Decision Transformer

H-DT.mp4

Walker-v2 Environment

Behavior Cloning

W-BC.mp4

Conservative Q-Learning

Trained the model for 1M iterations and for each iteration a batch of 256 rollouts were sampled.

W-CQL

W-CQL.mp4

Decision Transformer

W-DT.mp4

References

  1. Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. Decision Transformer: Reinforcement Learning via Sequence Modeling. CoRR (2021). ArXiv
  2. Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine. Conservative Q-Learning for Offline Reinforcement Learning. ArXiv
  3. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. Justin Fu and Aviral Kumar and Ofir Nachum and George Tucker and Sergey Levine. 2020.
  4. Hugging Face Transformers
  5. Edward Beeching's Gym Replays
  6. CleanRL and the authors' Study Group