Q-value updating problem in DDQN

Question

Q-value updating problem in DDQN

Closed this issue 6 years ago · 2 comments

Hello,

I think there is a mistake in your Q-value updating part of DDQN code (ddqn.py). Should np.argmax(next_q[0,:]) be np.argmax(next_q[i,:]) in line 64? It doesn't make any sense if only choosing the same st+1 to update each q_value for each <st, at, st+1, rt> in a minibatch.

        # Apply Bellman Equation on batch samples to train our DDQN
        q = self.agent.predict(s)
        next_q = self.agent.predict(new_s)
        q_targ = self.agent.target_predict(new_s)

        for i in range(s.shape[0]):
            old_q = q[i, a[i]]
            if d[i]:
                q[i, a[i]] = r[i]
            else:
                next_best_action = np.argmax(next_q[0,:]) # problematic, might be np.argmax(next_q[i,:])
                q[i, a[i]] = r[i] + self.gamma * q_targ[i, next_best_action]

Answer 1 · 2018-11-19T15:40:34.000Z

Well spotted! That is a mistake indeed, just corrected it, thanks for reporting it

Answer 2 · 2018-11-20T01:38:09.000Z

Well spotted! That is a mistake indeed, just corrected it, thanks for reporting it

Yes, you're welcome