Dueling dqn equation
HencyChen opened this issue · 1 comments
HencyChen commented
Thanks for offering this wonderful code. But I have a question.
- Why in the combination part of the equation, the advantage A need to subtract it's average? I've already refer to the paper but still don't understand.
HareshKarnan commented
^ because of the fact that there can be multiple V(s) and A(s,a) that satisfy the Advantage equation. For example,
Q(s,a) = V(s) + A(s,a) = (V(s)+c) + (A(s,a)-c)
So, to learn that unique V and A, you subtract mean of Advantage for actions so the advantage for the optimal action is 0.