XuanPolicy: A Comprehensive and Unified Deep Reinforcement Learning Library

XuanPolicy is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.

We call it as Xuan-Ce (玄策) in Chinese. "Xuan (玄)" means incredible and magic box, "Ce (策)" means policy.

DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks, and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan". This project gives a thorough, high-quality and easy-to-understand implementation of DRL algorithms, and hope this implementation can give a hint on the magics of reinforcement learning.

We expect it to be compatible with multiple deep learning toolboxes( PyTorch, TensorFlow, and MindSpore), and hope it can really become a zoo full of DRL algorithms.

Currently Included Algorithms

DRL

(Click to hide)

Deep Q Network - DQN [Paper]
DQN with Double Q-learning - Double DQN [Paper]
DQN with Dueling network - Dueling DQN [Paper]
DQN with Prioritized Experience Replay - PER [Paper]
DQN with Parameter Space Noise for Exploration - NoisyNet [Paper]
Deep Recurrent Q-Netwrk - DRQN [Paper]
DQN with Quantile Regression - QRDQN [Paper]
Distributional Reinforcement Learning - C51 [Paper]
Vanilla Policy Gradient - PG [Paper]
Phasic Policy Gradient - PPG [Paper] [Code]
Advantage Actor Critic - A2C [Paper] [Code]
Soft actor-critic based on maximum entropy - SAC [Paper] [Code]
Soft actor-critic for discrete actions - SAC-Discrete [Paper] [Code]
Proximal Policy Optimization with clipped objective - PPO-Clip [Paper] [Code]
Proximal Policy Optimization with KL divergence - PPO-KL [Paper] [Code]
Deep Deterministic Policy Gradient - DDPG [Paper] [Code]
Twin Delayed Deep Deterministic Policy Gradient - TD3 [Paper][Code]
Parameterised deep Q network - P-DQN [Paper]
Multi-pass parameterised deep Q network - MP-DQN [Paper] [Code]
Split parameterised deep Q network - SP-DQN [Paper]

Multi-Agent Reinforcement Learning (MARL)

(Click to hide)

Independent Q-learning - IQL [Paper] [Code]
Value Decomposition Networks - VDN [Paper] [Code]
Q-mixing networks - QMIX [Paper] [Code]
Weighted Q-mixing networks - WQMIX [Paper] [Code]
Q-transformation - QTRAN [Paper] [Code]
Deep Coordination Graphs - DCG [Paper] [Code]
Independent Deep Deterministic Policy Gradient - IDDPG [Paper]
Multi-agent Deep Deterministic Policy Gradient - MADDPG [Paper] [Code]
Counterfactual Multi-agent Policy Gradient - COMA [Paper] [Code]
Multi-agent Proximal Policy Optimization - MAPPO [Paper] [Code]
Mean-Field Q-learning - MFQ [Paper] [Code]
Mean-Field Actor-Critic - MFAC [Paper] [Code]
Independent Soft Actor-Critic - ISAC
Multi-agent Soft Actor-Critic - MASAC [Paper]
Multi-agent Twin Delayed Deep Deterministic Policy Gradient - MATD3 [Paper]

Supported Environments

Classic Control

(Click to hide)

CartPole

Pendulum

Acrobot

...

Box2D

(Click to hide)

CartPole

Pendulum

Acrobot

MuJoCo Environments

(Click to hide)

Ant

HalfCheetah

Hopper

Humanoid

...

Atari Environments

(Click to hide)

Breakout

Boxing

Alien

Adventure

Air Raid

...

MPE Environments

(Click to hide)

Simple Push

Simple Reference

Simple Spread

...

Magent2

(Click to hide)

Battle

Tiger Deer

Battle Field

...

SMAC

StarCraft Multi-Agentt Challenge.

Installation

The library can be run at Linux, Windows, MacOS, and EulerOS, etc.

Before installing XuanPolicy, you should install Anaconda to prepare a python environment. (Note: select a proper version of Anaconda from here.)

After that, open a terminal and install XuanPolicy by the following steps.

Step 1: Create a new conda environment (python>=3.7 is suggested):

conda create -n xpolicy python=3.7

Step 2: Activate conda environment:

conda activate xpolicy

Step 3: Install the library:

pip install xuanpolicy

This command does not include the dependencies of deep learning toolboxes. To install the XuanPolicy with deep learning tools, you can type pip install xuanpolicy[torch] for PyTorch, pip install xuanpolicy[tensorflow] for TensorFlow2, pip install xuanpolicy[mindspore] for MindSpore, and pip install xuanpolicy[all] for all dependencies.

Note: Some extra packages should be installed manually for further usage.

Basic Usage

Quickly Start

Train a Model

import xuanpolicy as xp

runner = xp.get_runner(method='dqn', 
                       env='classic_control', 
                       env_id='CartPole-v1', 
                       is_test=False)
runner.run()

Test the Model

import xuanpolicy as xp

runner_test = xp.get_runner(method='dqn', 
                            env='classic_control', 
                            env_id='CartPole-v1', 
                            is_test=True)
runner_test.run()

Logger

You can use tensorboard to visualize what happened in the training process. After training, the log file will be automatically generated in the directory ".results/" and you should be able to see some training data after running the command.

$ tensorboard --logdir ./logs/dqn/torch/CartPole-v0

Part of Benchmarks

Mujoco Environment

Task	DDPG	TD3	PG	A2C	PPO	PPG	SAC
Ant-v4	1472.8	4822.9	317.53	1420.4	2810.7	775.26	727.25
HalfCheetah-v4	10093	10718.1	891.27	2674.5	4628.4	1235.76	6663.20
Hopper-v4	3434.9	3492.4	5380	825.9	3450.1	174.5	2436.96
Walker2d-v4	2443.7	4307.9	316.21	970.6	4318.6	46.83	1367.31
Swimmer-v4	67.7	59.9	33.54	51.4	108.9	37.69	43.82
Humanoid-v4	99	547.88	322.05	240.9	705.5	78.29	358.70
Reacher-v4	-4.05	-4.07	-19.20	-11.7	-8.1	-6.76	-2.67
Ipendulum-v4	1000	1000	1000	1000	1000	160.40	1000
IDPendulum-v4	9359.8	9358.9	481.93	9357.8	9359.1	7023.87	9359.81

Atari Environment (Ongoing)

Task	DQN	C51	PPO
ALE/AirRaid-v5	7316.67	5450.00	9283.33
ALE/Alien-v5	2676.67	2413.33	2313.33
ALE/Amidar-v5	627.00	293.0	964.67
ALE/Assault-v5	9981.67	9088.67	6265.67
ALE/Asterix-v5	30516.67	12866.67	2900.00
ALE/Asteroids-v5	1393.33	2180.0	3430.00

StarCraft2 Environment (Ongoing)

@misc{XuanPolicy2023,
    title={XuanPolicy: A Comprehensive and Unified Deep Reinforcement Learning Library},
    author={Wenzhang Liu, Wenzhe Cai, Kun Jiang, Yuanda Wang, Guangran Cheng, Jiawei Wang, Jingyu Cao, Lele Xu, Chaoxu Mu, Changyin Sun},
    publisher = {GitHub},
    year={2023},
}

banren456/xuanpolicy

XuanPolicy: A Comprehensive and Unified Deep Reinforcement Learning Library

Currently Included Algorithms

DRL

Multi-Agent Reinforcement Learning (MARL)

Supported Environments

Classic Control

Box2D

MuJoCo Environments

Atari Environments

MPE Environments

Magent2

SMAC

Installation

Basic Usage

Quickly Start

Train a Model

Test the Model

Logger

Part of Benchmarks

Mujoco Environment

Atari Environment (Ongoing)

StarCraft2 Environment (Ongoing)