Lightweight and Scalable Deep Reinforcement Learning Using PyTorch
ElegantRL is developped for researchers and practitioners with the following advantages:
-
Lightweight: The core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).
-
Efficient: more efficient than Ray RLlib in many testing cases.
-
Stable: much more stable than Stable Baseline 3.
ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:
- DDPG, TD3, SAC, A2C, PPO, PPO(GAE) for continuous actions
- DQN, DoubleDQN, D3QN for discrete actions
For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.
Table of Contents
News
- [Towardsdatascience] ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library
- [Towardsdatascience] ElegantRL: Mastering PPO Algorithms
- [MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I)
- [MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II)
File Structure
An agent (in agent.py) uses networks (in net.py) and is trained (in run.py) by interacting with an environment (in env.py).
-----kernel files----
- elegantrl/net.py # Neural networks.
- Q-Net,
- Actor network,
- Critic network,
- elegantrl/agent.py # RL algorithms.
- AgentBase,
- elegantrl/run.py # run DEMO 1 ~ 4
- Parameter initialization,
- Training loop,
- Evaluator.
-----utils files----
- elegantrl/envs/ # gym env or custom env, including FinanceStockEnv.
- gym_utils.py: A PreprocessEnv class for gym-environment modification.
- Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
- eRL_demo_BipedalWalker.ipynb # BipedalWalker-v2 in jupyter notebooks
- eRL_demos.ipynb # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
- eRL_demo_SingleFilePPO.py # Use single file to train PPO, more simple than tutorial version
- eRL_demo_StockTrading.py # Stock Trading Application in jupyter notebooks
From a high-level overview,
- 1). Instantiate an environment in Env.py, and an agent in Agent.py with an Actor network and a Critic network in Net.py;
- 2). In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer;
- 3). The agent fetches a batch of transitions from the Replay Buffer to train its networks;
- 4). After each update, an evaluator evaluates the agent's performance (e.g., fitness score or cumulative return) and saves the agent if the performance is good.
Training Pipeline
Initialization:
- hyper-parameters
args
. env = PreprocessEnv()
: creates an environment (in the OpenAI gym format).agent = agent.XXX()
: creates an agent for a DRL algorithm.buffer = ReplayBuffer()
: stores the transitions.evaluator = Evaluator()
: evaluates and stores the trained model.
Then, the training process is controlled by a while-loop:
agent.explore_env(…)
: the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.agent.update_net(…)
: the agent uses a batch from the ReplayBuffer to update the network parameters.evaluator.evaluate_save(…)
: evaluates the agent's performance and keeps the trained model with the highest score.
The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.
Experimental Results
Results using ElegantRL
BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward.
Check out a video on bilibili: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.
Requirements
Necessary:
| Python 3.6+ |
| PyTorch 1.6+ |
Not necessary:
| Numpy 1.18+ | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0 | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==1.18 pyglet==1.6. Change to gym==1.17.0, pyglet==1.5)
| pybullet 2.7+ | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8 | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2 | For plots. Evaluate the agent performance.
pip3 install gym==1.17.0 pybullet Box2D matplotlib
Citation:
To cite this repository:
@misc{erl,
author = {Xiao-Yang Liu, Zechu Li, Zhaoran Wang, Jiahao Zheng},
title = {ElegantRL: A Lightweight and Scalable Deep Reinforcement Learning Library},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/AI4Finance-Foundation/ElegantRL}},
}