/rudder

RUDDER: Return Decomposition for Delayed Rewards

RUDDER: Return Decomposition for Delayed Rewards

RUDDER efficiently learns optimal policies in finite Markov decision processes with delayed rewards. With the following links you can find: