simoninithomas/Deep_reinforcement_learning_Course

numerical stability of preprocessing function

Theophylline opened this issue · 1 comments

Hi Simon,

Thank you for your tutorial. Are there any other ways to normalize the discounted reward if there's a good chance that the agent will not receive any reward during the beginning stages of training?

For example, (x - x.mean) / x.std will blow up if the agent does not receive any reward in an episode. Thanks for your help.

Solved by adding 1 to the reward from each state.