awjuliani/DeepRL-Agents

# For Policy Network Problem

Closed this issue · 3 comments

Thanks for your code, but I have a question that if the rewards is negative, do the code still work?
If not, how to fix it or ensure the loss keep in positive?

Hi
Rewards can be negative (if they represent penalty)

For example
In the basic scenario
living_reward = -1

which means the agent is getting a negative rewards and it has to achieve its goal ASAP

Thanks a lot.
But I still have a question that the loss function would converge to what degree? Negative value? Or Zero?

Hi cumttang,

Which RL algorithm are you referring to? In all cases the loss function is designed to support both positive and negative rewards, and having negative rewards should not interfere with training at all, so long as the reward function of the environment itself is sensible.

One thing to keep in mind though is that methods such as DQN and A3C have issues with overly large rewards (either positive or negative). It is recommended that rewards going into the network not be of greater magnitude than 1.