RLLTE: Long-Term Evolution Project of Reinforcement Learning is inspired by the long-term evolution (LTE) standard project in telecommunications, which aims to track the latest research progress in reinforcement learning (RL) and provide stable and efficient baselines. In rllte, you can find everything you need in RL, such as training, evaluation, deployment, etc.
If you use rllte in your research, please cite this project like this:
@software{rllte,
author = {Mingqi Yuan, Zequn Zhang, Yang Xu, Shihao Luo, Bo Li, Xin Jin, and Wenjun Zeng},
title = {RLLTE: Long-Term Evolution Project of Reinforcement Learning},
url = {https://github.com/RLE-Foundation/rllte},
year = {2023},
}
- Contents
- Overview
- Quick Start
- Implemented Modules
- Benchmark
- API Documentation
- How To Contribute
- Acknowledgment
For the project tenet, please read Evolution Tenet.
The highlight features of rllte:
- π¨β
βοΈ Large language model-empowered copilot; - β±οΈ Latest algorithms and tricks;
- π Standard and sophisticated modules for redevelopment;
- 𧱠Highly modularized design for complete decoupling of RL algorithms;
- π Optimized workflow for full hardware acceleration;
- βοΈ Support custom environments and modules;
- π₯οΈ Support multiple computing devices like GPU and NPU;
- π οΈ Support RL model engineering deployment (TensorRT, CANN, ...);
- πΎ Large number of reusable benchmarks (See rllte-hub);
See the project structure below:
-
Agent: Implemented RL Agents using rllte building blocks.
-
Common: Base classes and auxiliary modules.
-
Xploit: Modules that focus on exploitation in RL.
- Encoder: Neural nework-based encoders for processing observations.
- Policy: Policies for interaction and learning.
- Storage: Storages for storing collected experiences.
-
Xplore: Modules that focus on exploration in RL.
- Augmentation: PyTorch.nn-like modules for observation augmentation.
- Distribution: Distributions for sampling actions.
- Reward: Intrinsic reward modules for enhancing exploration.
-
Hub: Fast training API and reusable benchmarks.
-
Env: Packaged environments (e.g., Atari games) for fast invocation.
-
Evaluation: Reasonable and reliable metrics for algorithm evaluation.
-
Pre-training: Methods of pre-training in RL.
-
Deployment: Methods of model deployment in RL.
For more detiled descriptions of these modules, see https://docs.rllte.dev/api
- Prerequisites
Currently, we recommend Python>=3.8
, and user can create an virtual environment by
conda create -n rllte python=3.8
- with pip
recommended
Open up a terminal and install rllte with pip
:
pip install rllte-core # basic installation
pip install rllte-core[envs] # for pre-defined environments
- with git
Open up a terminal and clone the repository from GitHub with git
:
git clone https://github.com/RLE-Foundation/rllte.git
After that, run the following command to install package and dependencies:
pip install -e . # basic installation
pip install -e .[envs] # for pre-defined environments
For more detailed installation instruction, see https://docs.rllte.dev/getting_started.
For example, we want to use DrQ-v2 to solve a task of DeepMind Control Suite, and it suffices to write a train.py
like:
# import `env` and `agent` api
from rllte.env import make_dmc_env
from rllte.agent import DrQv2
if __name__ == "__main__":
device = "cuda:0"
# create env, `eval_env` is optional
env = make_dmc_env(env_id="cartpole_balance", device=device)
eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
# create agent
agent = DrQv2(env=env,
eval_env=eval_env,
device='cuda',
tag="drqv2_dmc_pixel")
# start training
agent.train(num_train_steps=5000)
Run train.py
and you will see the following output:
Similarly, if we want to train an agent on HUAWEI NPU, it suffices to replace DrQv2
with NpuDrQv2
:
# import `env` and `agent` api
from rllte.env import make_dmc_env
from rllte.agent import DrQv2
if __name__ == "__main__":
device = "npu:0"
# create env, `eval_env` is optional
env = make_dmc_env(env_id="cartpole_balance", device=device)
eval_env = make_dmc_env(env_id="cartpole_balance", device=device)
# create agent
agent = DrQv2(env=env,
eval_env=eval_env,
device='cuda',
tag="drqv2_dmc_pixel")
# start training
agent.train(num_train_steps=5000)
Then you will see the following output:
Please refer to Implemented Modules for the compatibility of NPU.
For more detailed tutorials, see https://docs.rllte.dev/tutorials.
Type | Module | Recurrent | Box | Discrete | MultiBinary | Multi Processing | NPU | Paper | Citations |
---|---|---|---|---|---|---|---|---|---|
Original | SAC | β | βοΈ | β | β | β | βοΈ | Link | 5077β |
DDPG | β | βοΈ | β | β | β | βοΈ | Link | 11819β | |
PPO | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | Link | 11155β | |
DAAC | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | Link | 56β | |
IMPALA | βοΈ | βοΈ | βοΈ | β | βοΈ | β | Link | 1219β | |
Augmented | DrQ-v2 | β | βοΈ | β | β | β | βοΈ | Link | 100β |
DrQ | β | βοΈ | β | β | β | βοΈ | Link | 433β | |
DrAC | β | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | Link | 29β |
- DrQ=SAC+Augmentation, DDPG=DrQ-v2-Augmentation, DrAC=PPO+Augmentation.
- π: Developing.
NPU
: Support Neural-network processing unit.Recurrent
: Support recurrent neural network.Box
: A N-dimensional box that containes every point in the action space.Discrete
: A list of possible actions, where each timestep only one of the actions can be used.MultiBinary
: A list of possible actions, where each timestep any of the actions can be used in any combination.
- π: Developing.
Repr.
: The method involves representation learning.Visual
: The method works well in visual RL.
See Tutorials: Use Intrinsic Reward and Observation Augmentation for usage examples.
rllte provides a large number of reusable bechmarks, see https://hub.rllte.dev/ and https://docs.rllte.dev/benchmarks/
View our well-designed documentation: https://docs.rllte.dev/
Welcome to contribute to this project! Before you begin writing code, please read CONTRIBUTING.md for guide first.
This project is supported by The Hong Kong Polytechnic University, Eastern Institute for Advanced Study, and FLW-Foundation. EIAS HPC provides a GPU computing platform, and Ascend Community provides an NPU computing platform for our testing. Some code of this project is borrowed or inspired by several excellent projects, and we highly appreciate them. See ACKNOWLEDGMENT.md.