off-policy-continuous-control: A Python repository from Shaza-Is

Recurrent Off-policy Baselines for Memory-based Continuous Control

This repo is the official codebase of our following paper:

@article{yang2021recurrent,
  title={Recurrent Off-policy Baselines for Memory-based Continuous Control},
  author={Yang, Zhihan and Nguyen, Hai},
  journal={Deep RL Workshop, NeurIPS 2021},
  year={2021}
}

Paper summary: We implement and benchmark recurrent versions of DDPG, TD3 and SAC that uses full history.

This repo offers:

DDPG, TD3 and SAC (clean PyTorch implementation and benchmarked against stable-baselines3*)
Recurrent versions of DDPG, TD3 and SAC that use full history: RDPG, RTD3 and RSAC
Very easy to understand and use; see our exhaustive documentation: link

*The results of benchmarking can be found in issue "Performance check against SB3" in closed Issues.

For users:

Please feel free to ask a code question through Issues.
When cloning this repo, please consider using shallow clone as it is large due to a large number of commits.

News:

[2021/11/18] Noticed that I forgot to document dependencies in documentation. Added.

Paper link:

https://arxiv.org/pdf/2110.12628.pdf

Poster (click to open in new tab for better resolution):

Star history:

Shaza-Is/off-policy-continuous-control

Recurrent Off-policy Baselines for Memory-based Continuous Control