# For Policy Network Problem

Question

# For Policy Network Problem

Closed this issue 8 years ago · 3 comments

Thanks for your code, but I have a question that if the rewards is negative, do the code still work?
If not, how to fix it or ensure the loss keep in positive?

Answer 1 · 2017-04-06T14:50:33.000Z

Hi
Rewards can be negative (if they represent penalty)

For example
In the basic scenario
living_reward = -1

which means the agent is getting a negative rewards and it has to achieve its goal ASAP

Answer 2 · 2017-04-06T15:02:00.000Z

Thanks a lot.
But I still have a question that the loss function would converge to what degree? Negative value? Or Zero?

Answer 3 · 2017-04-06T17:39:40.000Z

Hi cumttang,

Which RL algorithm are you referring to? In all cases the loss function is designed to support both positive and negative rewards, and having negative rewards should not interfere with training at all, so long as the reward function of the environment itself is sensible.

One thing to keep in mind though is that methods such as DQN and A3C have issues with overly large rewards (either positive or negative). It is recommended that rewards going into the network not be of greater magnitude than 1.