Clean, Robust, and Unified PyTorch implementation of popular DRL Algorithms
This repository uses the following python dependencies unless explicitly stated:
gymnasium==0.29.1
numpy==1.26.1
pytorch==2.1.0
python==3.11.5
Enter the folder of the algorithm that you want to use, and run the main.py to train from scratch:
For more details, please check the README.md file in the corresponding algorithm folder.
3. Separate links of the code
4. Recommended Resources for DRL
4.1 Simulation Environments:
- gym and gymnasium (Lightweight & Standard Env for DRL; Easy to start; Slow):
- Isaac Gym (NVIDIA’s physics simulation environment; GPU accelerated; Superfast):
- Sparrow (Light Weight Simulator for Mobile Robot; DRL friendly):
- ROS (Popular & Comprehensive physical simulator for robots; Heavy and Slow):
- Webots (Popular physical simulator for robots; Faster than ROS; Less realistic):
DQN: Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533.
Double DQN: Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1).
Duel DQN: Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
PER: Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.
C51: Bellemare M G, Dabney W, Munos R. A distributional perspective on reinforcement learning[C]//International conference on machine learning. PMLR, 2017: 449-458.
NoisyNet DQN: Fortunato M, Azar M G, Piot B, et al. Noisy networks for exploration[J]. arXiv preprint arXiv:1706.10295, 2017.
PPO: Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.
DDPG: Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
TD3: Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018: 1587-1596.
SAC: Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International conference on machine learning. PMLR, 2018: 1861-1870.
ASL: Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity
6. Training Curves of my Code:
CartPole |
LunarLander |
|
|
Pong |
Enduro |
|
|
CartPole |
LunarLander |
|
|
CartPole |
LunarLander |
|
|
CartPole |
LunarLander |
|
|
Pendulum |
LunarLanderContinuous |
|
|