Hsuanwu: Long-Term Evolution Project of Reinforcement Learning is inspired by the long-term evolution (LTE) standard project in telecommunications, which aims to track the latest research progress in reinforcement learning (RL) and provide stable and efficient baselines. In Hsuanwu, you can find everything you need in RL, such as training, evaluation, deployment, etc. The highlight features of Hsuanwu:
- β±οΈ Latest algorithms and tricks;
- 𧱠Highly modularized design for complete decoupling of RL algorithms;
- π Optimized workflow for full hardware acceleration;
- βοΈ Support for custom environments;
- π₯οΈ Support for multiple computing devices like GPU and NPU;
- π οΈ Support for RL model engineering deployment (TensorRT, CANN, ...);
- πΎ Large number of reusable bechmarks (See HsuanwuHub);
- π Elegant experimental management powered by Hydra.
Hsuanwu (Xuanwu, ηζ¦) is one of the Four Symbols of the Chinese constellations, representing the north and the winter season. It is usually depicted as a turtle entwined together with a snake. Since turtles are very long-lived, we use this name to symbolize the long-term and influential development of the project.
Join the developer community for issues and discussions:
Slack | GitHub | |
---|---|---|
Quick Start
Installation
- Prerequisites
Currently, Hsuanwu recommends Python>=3.8
, user can create an virtual environment by
conda create -n hsuanwu python=3.8
- with pip
recommended
Open up a terminal and install Hsuanwu with pip
:
pip install hsuanwu # basic installation
pip install hsuanwu[envs] # for pre-defined environments
pip install hsuanwu[tests] # for project tests
pip install hsuanwu[all] # install all the dependencies
- with git
Open up a terminal and clone the repository from GitHub with git
:
git clone https://github.com/RLE-Foundation/Hsuanwu.git
After that, run the following command to install package and dependencies:
pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments
pip install -e .[tests] # for project tests
pip install -e .[all] # install all the dependencies
For more detailed installation instruction, see https://docs.hsuanwu.dev/getting_started.
Build your first Hsuanwu application
On NVIDIA GPU
For example, we want to use DrQ-v2 to solve a task of DeepMind Control Suite, and we only need the following two steps:
- Write a
config.yaml
file in your working directory like:
experiment: drqv2_dmc # Experiment ID.
device: cuda:0 # Device (cpu, cuda, ...) on which the code should be run.
seed: 1 # Random seed for reproduction.
num_train_steps: 250000 # Number of training steps.
agent:
name: DrQv2 # The agent name.
- Write a
train.py
file like:
import hydra # Use Hydra to manage experiments
from hsuanwu.env import make_dmc_env # Import DeepMind Control Suite
from hsuanwu.common.engine import HsuanwuEngine # Import Hsuanwu engine
train_env = make_dmc_env(env_id='cartpole_balance') # Create train env
test_env = make_dmc_env(env_id='cartpole_balance') # Create test env
@hydra.main(version_base=None, config_path='./', config_name='config')
def main(cfgs):
engine = HsuanwuEngine(cfgs=cfgs, train_env=train_env, test_env=test_env) # Initialize engine
engine.invoke() # Start training
if __name__ == '__main__':
main()
Run train.py
and you will see the following output:
Alternatively, you can use HsuanwuHub
to realize fast training, in which we preset a large number of RL applications. Install HsuanwuHub
with pip
:
pip install hsuanwuhub
Then run the following command to perform training directly:
python -m hsuanwuhub.train \
task=drqv2_dmc_pixel \
device=cuda:0 \
num_train_steps=50000
On HUAWEI NPU
Similarly, if we want to train an agent on HUAWEI NPU, it suffices to override the training command like:
python train.py device=npu:0
Then you will see the following output:
Please refer to Implemented Modules for the compatibility of NPU.
For more detailed tutorials, see https://docs.hsuanwu.dev/tutorials.
Implemented Modules
Roadmap
Hsuanwu evolves based on reinforcement learning algorithms and integrates latest tricks. The following figure demonstrates the main evolution roadmap of Hsuanwu:
Project Structure
See the project structure below:
-
Common: Auxiliary modules like trainer and logger.
- Engine: Engine for building Hsuanwu application.
- Logger: Logger for managing output information.
-
Xploit: Modules that focus on exploitation in RL.
- Encoder: Neural nework-based encoder for processing observations.
- Agent: Agent for interacting and learning.
- Storage: Storage for storing collected experiences.
-
Xplore: Modules that focus on exploration in RL.
- Augmentation: PyTorch.nn-like modules for observation augmentation.
- Distribution: Distributions for sampling actions.
- Reward: Intrinsic reward modules for enhancing exploration.
-
Evaluation: Reasonable and reliable metrics for algorithm evaluation.
-
Env: Packaged environments (e.g., Atari games) for fast invocation.
-
Pre-training: Methods of pre-training in RL.
-
Deployment: Methods of model deployment in RL.
For more detiled descriptions of these modules, see https://docs.hsuanwu.dev/api
RL Agents
Module | Recurrent | Box | Discrete | MultiBinary | Multi Processing | NPU | Paper | Citations |
---|---|---|---|---|---|---|---|---|
SAC | β | βοΈ | β | β | β | π | Link | 5077β |
DrQ | β | βοΈ | β | β | β | π | Link | 433β |
DDPG | β | βοΈ | β | β | β | βοΈ | Link | 11819β |
DrQ-v2 | β | βοΈ | β | β | β | βοΈ | Link | 100β |
PPO | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | Link | 11155β |
DrAC | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | Link | 29β |
DAAC | β | βοΈ | βοΈ | βοΈ | βοΈ | π | Link | 56β |
PPG | β | βοΈ | βοΈ | β | βοΈ | π | Link | 82β |
IMPALA | βοΈ | βοΈ | βοΈ | β | βοΈ | π | Link | 1219β |
- π: Developing.
NPU
: Support Neural-network processing unit.Recurrent
: Support recurrent neural network.Box
: A N-dimensional box that containes every point in the action space.Discrete
: A list of possible actions, where each timestep only one of the actions can be used.MultiBinary
: A list of possible actions, where each timestep any of the actions can be used in any combination.
Intrinsic Reward Modules
- π: Developing.
Repr.
: The method involves representation learning.Visual
: The method works well in visual RL.
See Tutorials: Use intrinsic reward and observation augmentation for usage examples.
Model Zoo
Hsuanwu provides a large number of reusable bechmarks, see https://hub.hsuanwu.dev/ and https://docs.hsuanwu.dev/benchmarks/
API Documentation
View our well-designed documentation: https://docs.hsuanwu.dev/
How To Contribute
Welcome to contribute to this project! Before you begin writing code, please read CONTRIBUTING.md for guide first.
Acknowledgment
This project is supported by FUNDING.yml. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See ACKNOWLEDGMENT.md.