Simple Policy Faulty Loss Function
wert23239 opened this issue · 0 comments
wert23239 commented
Your loss function for the simple policy doesn't really make sense
"Loss=-Log(pi)*A"
If you have a weight of .9 and reward of 1
your loss is .045.
but if you have a weight of .9 and your reward is 3
your loss increases to .09 .
So the only reason your function works at all is that you only assign a single amount of reward.