numerical stability of preprocessing function

Question

numerical stability of preprocessing function

Theophylline opened this issue 6 years ago · 1 comments

Hi Simon,

Thank you for your tutorial. Are there any other ways to normalize the discounted reward if there's a good chance that the agent will not receive any reward during the beginning stages of training?

For example, (x - x.mean) / x.std will blow up if the agent does not receive any reward in an episode. Thanks for your help.

Answer 1 · 2018-08-07T19:27:37.000Z

Solved by adding 1 to the reward from each state.