This is the open source implementation of a few important multi-step deep RL algorithms discussed in the ICML 2021 paper. We implement these algorithms by Pytorch and mainly in combination with TD3 and SAC, which are currently popular actor-critic algorithms for continuous control. Moreover, we apply the multi-step deep RL algorithms to the Financial Environment inspired by FinRL.
This code implements a few multi-step algorithms to reduce bias and variance, including
- Peng's Q(lambda)
- Uncorrected n-step
- Retrace
- Tree-backup
- Undamped Importance Sampling
Please first specify the path of data in trace_main.py
:
train_trade_path = '..'
val_trade_path = '..'
train_feature_path = '..'
val_feature_path = '..'
For a length of look back window as 10 (l=10
),
to run Retrace-SAC, with n-step buffer with n=60
, run the following
python trace_main.py --trace_type='retrace' --lambda_=0.8 --nsteps=60 --look_back=10
To run Peng's Q(lambda) with the same setting we have
python trace_main.py --trace_type='qlambda' --lambda_=0.8 --nsteps=60 --look_back=10
To run Tree-backup, we have
python trace_main.py --trace_type='treebackup' --nsteps=60 --look_back=10
To run uncorrected n-step, we have
python trace_main.py --trace_type='qlambda' --lambda_=1.0 --nsteps=60 --look_back=10
To run pure Importance Sampling, we have
python trace_main.py --trace_type='IS' --lambda_=1.0 --nsteps=60 --look_back=10
To run 1-step SAC we have
python trace_main.py --nsteps=1 --look_back=10
To run 1-step TD3 we have
python trace_main.py --policy_type='Deterministic' --nsteps=1 --look_back=10
Some commonly used variables
python trace_main.py --policy_type [str:"Gaussian", "Deterministic"] \
--trace_type [str:"retrace","qlambda", "treebackup","IS"] \
--model_type [str:"transformer","lstm","fc"] \
--look_back [int:1-50] \
--cuda [bool:True or False] \
--nsteps [int:1-120] \
--lambda_ [float:0-1] \
--lr_actor [float:] \
--lr_critic [float:] \
--lr_alpha [float:] \
--episodes [int:1-1000] \
--reps [int:5-10]
Finally, we suggest using tensorboard to visualize the loss during training by
tensorboard --logdir=./runs