Dueling dqn equation

Question

HencyChen opened this issue 7 years ago · 1 comments

Thanks for offering this wonderful code. But I have a question.

Why in the combination part of the equation, the advantage A need to subtract it's average? I've already refer to the paper but still don't understand.

Answer 1 · 2020-09-28T20:21:54.000Z

^ because of the fact that there can be multiple V(s) and A(s,a) that satisfy the Advantage equation. For example,

Q(s,a) = V(s) + A(s,a) = (V(s)+c) + (A(s,a)-c)

So, to learn that unique V and A, you subtract mean of Advantage for actions so the advantage for the optimal action is 0.