RUDDER efficiently learns optimal policies in finite Markov decision processes with delayed rewards. With the following links you can find:
- Our RUDDER paper: https://arxiv.org/abs/1806.07857
- RUDDER blog: https://ml-jku.github.io/rudder/
- Code for RUDDER demonstration on example-task in blog: https://github.com/ml-jku/rudder-demonstration-code
- A practical step-by-step guide to applying RUDDER in PyTorch: https://github.com/widmi/rudder-a-practical-tutorial