numerical stability of preprocessing function
Theophylline opened this issue · 1 comments
Theophylline commented
Hi Simon,
Thank you for your tutorial. Are there any other ways to normalize the discounted reward if there's a good chance that the agent will not receive any reward during the beginning stages of training?
For example, (x - x.mean) / x.std will blow up if the agent does not receive any reward in an episode. Thanks for your help.
Theophylline commented
Solved by adding 1 to the reward from each state.