This repository includes Decision Transformer, Conservative Q-Learning, and Behavior cloning implementations on OpenAI Gym's MuJoCo environments. Instead of training a policy through conventional Reinforcement Learning Algorithms, models are trained on collected experiences recorded by an expert player based on Offline Reinforcement Learning setting. In Offline RL, unlike the traditional RL in which data are obtained via interactions with environments, the policy has only access to a limited dataset consisting of trajectory rollouts. This setting restricts the agent's ability by preventing them from exploring the environment and obtaining feedback; however, the Offline RL approaches depicted successful results despite those difficulties.
- PyTorch
- OpenAI Gym
- D4RL
- Hugging Face Transformers (GPT-2 Model)
- Behavior Cloning (BC)
- Decision Transformer
- GPT-2
- Soft Actor-Critic (SAC)
- Conseravtive Q-Learning (CQL)
HC-BC.mp4
Trained the model for 1M iterations and for each iteration a batch of 256 rollouts were sampled.
HC-CQL.mp4
HC-DT.mp4
H-BC.mp4
Trained the model for 1M iterations and for each iteration a batch of 256 rollouts were sampled.
H-CQL.mp4
H-DT.mp4
W-BC.mp4
Trained the model for 1M iterations and for each iteration a batch of 256 rollouts were sampled.
W-CQL.mp4
W-DT.mp4
- Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. Decision Transformer: Reinforcement Learning via Sequence Modeling. CoRR (2021). ArXiv
- Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine. Conservative Q-Learning for Offline Reinforcement Learning. ArXiv
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning. Justin Fu and Aviral Kumar and Ofir Nachum and George Tucker and Sergey Levine. 2020.
- Hugging Face Transformers
- Edward Beeching's Gym Replays
- CleanRL and the authors' Study Group