dxyang/DQN_pytorch

Dueling dqn equation

HencyChen opened this issue · 1 comments

Thanks for offering this wonderful code. But I have a question.

  1. Why in the combination part of the equation, the advantage A need to subtract it's average? I've already refer to the paper but still don't understand.

^ because of the fact that there can be multiple V(s) and A(s,a) that satisfy the Advantage equation. For example,

Q(s,a) = V(s) + A(s,a) = (V(s)+c) + (A(s,a)-c)

So, to learn that unique V and A, you subtract mean of Advantage for actions so the advantage for the optimal action is 0.