KAIR algorithm is a research repository with state of the art reinforcement learning algorithms for robot control tasks. It allows the researchers to experiment with novel ideas with minimal code changes.
The scripts folder contains implementations of a curated list of RL algorithms verified in MuJoCo environment.
-
Twin Delayed Deep Deterministic Policy Gradient (TD3)
- TD3 (Fujimoto et al., 2018) is an extension of DDPG (Lillicrap et al., 2015), a deterministic policy gradient algorithm that uses deep neural networks for function approximation. Inspired by Deep Q-Networks (Mnih et al., 2015), DDPG uses experience replay and target network to improve stability. TD3 further improves DDPG by adding clipped double Q-learning (Van Hasselt, 2010) to mitigate overestimation bias (Thrun & Schwartz, 1993) and delaying policy updates to address variance.
- Example Script on LunarLander
- ArXiv Preprint
-
(Twin) Soft Actor Critic (SAC)
- SAC (Haarnoja et al., 2018a) incorporates maximum entropy reinforcment learning, where the agent's goal is to maximize expected reward and entropy concurrently. Combined with TD3, SAC achieves state of the art performance in various continuous control tasks. SAC has been extended to allow automatically tuning of the temperature parameter (Haarnoja et al., 2018b), which determines the importance of entropy against the expected reward.
- Example Script on LunarLander
- ArXiv Preprint (Original SAC)
- ArXiv Preprint (SAC with autotuned temperature)
-
TD3 from Demonstrations, SAC from Demonstrations (TD3fD, SACfD)
- DDPGfD (Vecerik et al., 2017) is an imitation learning algorithm that infuses demonstration data into experience replay. DDPGfD also improved DDPG by (1) using prioritized experience replay (Schaul et al., 2015), (2) adding n-step returns, (3) learning multiple times per environment step, and (4) adding L2 regularizers to actor and critic losses. We incorporated these improvements to TD3 and SAC and found that it dramatically improves their performance.
- Example Script of TD3fD on LunarLander
- Example Script of SACfD on LunarLander
- ArXiv Preprint
To use the algorithms, first use the requirements.txt file to install appropriate Python packages from PyPI.
cd scripts
pip install -r requirements.txt
We are currently writing a white paper to summarize the results. We will add a BibTeX entry below once the paper is finalized.