XuanPolicy is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.
We call it as Xuan-Ce (玄策) in Chinese. "Xuan (玄)" means incredible and magic box, "Ce (策)" means policy.
DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks, and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan". This project gives a thorough, high-quality and easy-to-understand implementation of DRL algorithms, and hope this implementation can give a hint on the magics of reinforcement learning.
We expect it to be compatible with multiple deep learning toolboxes( PyTorch, TensorFlow, and MindSpore), and hope it can really become a zoo full of DRL algorithms.
| Full Documentation | 中文文档 | OpenI (启智社区) | XuanCe (Mini version) |
(Click to hide)
- Deep Q Network - DQN [Paper]
- DQN with Double Q-learning - Double DQN [Paper]
- DQN with Dueling network - Dueling DQN [Paper]
- DQN with Prioritized Experience Replay - PER [Paper]
- DQN with Parameter Space Noise for Exploration - NoisyNet [Paper]
- Deep Recurrent Q-Netwrk - DRQN [Paper]
- DQN with Quantile Regression - QRDQN [Paper]
- Distributional Reinforcement Learning - C51 [Paper]
- Vanilla Policy Gradient - PG [Paper]
- Phasic Policy Gradient - PPG [Paper] [Code]
- Advantage Actor Critic - A2C [Paper] [Code]
- Soft actor-critic based on maximum entropy - SAC [Paper] [Code]
- Soft actor-critic for discrete actions - SAC-Discrete [Paper] [Code]
- Proximal Policy Optimization with clipped objective - PPO-Clip [Paper] [Code]
- Proximal Policy Optimization with KL divergence - PPO-KL [Paper] [Code]
- Deep Deterministic Policy Gradient - DDPG [Paper] [Code]
- Twin Delayed Deep Deterministic Policy Gradient - TD3 [Paper][Code]
- Parameterised deep Q network - P-DQN [Paper]
- Multi-pass parameterised deep Q network - MP-DQN [Paper] [Code]
- Split parameterised deep Q network - SP-DQN [Paper]
(Click to hide)
- Independent Q-learning - IQL [Paper] [Code]
- Value Decomposition Networks - VDN [Paper] [Code]
- Q-mixing networks - QMIX [Paper] [Code]
- Weighted Q-mixing networks - WQMIX [Paper] [Code]
- Q-transformation - QTRAN [Paper] [Code]
- Deep Coordination Graphs - DCG [Paper] [Code]
- Independent Deep Deterministic Policy Gradient - IDDPG [Paper]
- Multi-agent Deep Deterministic Policy Gradient - MADDPG [Paper] [Code]
- Counterfactual Multi-agent Policy Gradient - COMA [Paper] [Code]
- Multi-agent Proximal Policy Optimization - MAPPO [Paper] [Code]
- Mean-Field Q-learning - MFQ [Paper] [Code]
- Mean-Field Actor-Critic - MFAC [Paper] [Code]
- Independent Soft Actor-Critic - ISAC
- Multi-agent Soft Actor-Critic - MASAC [Paper]
- Multi-agent Twin Delayed Deep Deterministic Policy Gradient - MATD3 [Paper]
StarCraft Multi-Agentt Challenge.
The library can be run at Linux, Windows, MacOS, and EulerOS, etc.
Before installing XuanPolicy, you should install Anaconda to prepare a python environment. (Note: select a proper version of Anaconda from here.)
After that, open a terminal and install XuanPolicy by the following steps.
Step 1: Create a new conda environment (python>=3.7 is suggested):
conda create -n xpolicy python=3.7
Step 2: Activate conda environment:
conda activate xpolicy
Step 3: Install the library:
pip install xuanpolicy
This command does not include the dependencies of deep learning toolboxes. To install the XuanPolicy with
deep learning tools, you can type pip install xuanpolicy[torch]
for PyTorch,
pip install xuanpolicy[tensorflow]
for TensorFlow2,
pip install xuanpolicy[mindspore]
for MindSpore,
and pip install xuanpolicy[all]
for all dependencies.
Note: Some extra packages should be installed manually for further usage.
import xuanpolicy as xp
runner = xp.get_runner(method='dqn',
env='classic_control',
env_id='CartPole-v1',
is_test=False)
runner.run()
import xuanpolicy as xp
runner_test = xp.get_runner(method='dqn',
env='classic_control',
env_id='CartPole-v1',
is_test=True)
runner_test.run()
You can use tensorboard to visualize what happened in the training process. After training, the log file will be automatically generated in the directory ".results/" and you should be able to see some training data after running the command.
$ tensorboard --logdir ./logs/dqn/torch/CartPole-v0
Task | DDPG | TD3 | PG | A2C | PPO | PPG | SAC |
---|---|---|---|---|---|---|---|
Ant-v4 | 1472.8 | 4822.9 | 317.53 | 1420.4 | 2810.7 | 775.26 | 727.25 |
HalfCheetah-v4 | 10093 | 10718.1 | 891.27 | 2674.5 | 4628.4 | 1235.76 | 6663.20 |
Hopper-v4 | 3434.9 | 3492.4 | 5380 | 825.9 | 3450.1 | 174.5 | 2436.96 |
Walker2d-v4 | 2443.7 | 4307.9 | 316.21 | 970.6 | 4318.6 | 46.83 | 1367.31 |
Swimmer-v4 | 67.7 | 59.9 | 33.54 | 51.4 | 108.9 | 37.69 | 43.82 |
Humanoid-v4 | 99 | 547.88 | 322.05 | 240.9 | 705.5 | 78.29 | 358.70 |
Reacher-v4 | -4.05 | -4.07 | -19.20 | -11.7 | -8.1 | -6.76 | -2.67 |
Ipendulum-v4 | 1000 | 1000 | 1000 | 1000 | 1000 | 160.40 | 1000 |
IDPendulum-v4 | 9359.8 | 9358.9 | 481.93 | 9357.8 | 9359.1 | 7023.87 | 9359.81 |
Task | DQN | C51 | PPO |
---|---|---|---|
ALE/AirRaid-v5 | 7316.67 | 5450.00 | 9283.33 |
ALE/Alien-v5 | 2676.67 | 2413.33 | 2313.33 |
ALE/Amidar-v5 | 627.00 | 293.0 | 964.67 |
ALE/Assault-v5 | 9981.67 | 9088.67 | 6265.67 |
ALE/Asterix-v5 | 30516.67 | 12866.67 | 2900.00 |
ALE/Asteroids-v5 | 1393.33 | 2180.0 | 3430.00 |
@misc{XuanPolicy2023,
title={XuanPolicy: A Comprehensive and Unified Deep Reinforcement Learning Library},
author={Wenzhang Liu, Wenzhe Cai, Kun Jiang, Yuanda Wang, Guangran Cheng, Jiawei Wang, Jingyu Cao, Lele Xu, Chaoxu Mu, Changyin Sun},
publisher = {GitHub},
year={2023},
}