rl_lib: A Python repository from act65

The goal is to build an efficient learner which I can use for my other projects.

We use;

the 'soft watkins' td update (from Human-level Atari 200x faster) to help correct for off policy actions and allow the use of multi step returns.
an exponential moving average target network to help stabilise training (I havent seen elsewhere, but havent properly looked. still needs to be evaluated -- WIP)
(TODOs) uncertainty + discount / exploration / multiagent / reward normalisation / etc

There are also some replay buffers implemented using reverb.

a replay buffer supporting multi-step returns,
a multi agent replay buffer,
a replay buffer supporting offline / prior data (from Efficient Online Reinforcement Learning with Offline Data)

Code is inspired in style by (/ copied from) the rlax examples.

act65/rl_lib