/rl_lib

utils for doing rl

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

The goal is to build an efficient learner which I can use for my other projects.

We use;

  • the 'soft watkins' td update (from Human-level Atari 200x faster) to help correct for off policy actions and allow the use of multi step returns.
  • an exponential moving average target network to help stabilise training (I havent seen elsewhere, but havent properly looked. still needs to be evaluated -- WIP)
  • (TODOs) uncertainty + discount / exploration / multiagent / reward normalisation / etc

There are also some replay buffers implemented using reverb.

Code is inspired in style by (/ copied from) the rlax examples.