Q-value updating problem in DDQN
Closed this issue · 2 comments
eric-liuyd commented
Hello,
I think there is a mistake in your Q-value updating part of DDQN code (ddqn.py). Should np.argmax(next_q[0,:]) be np.argmax(next_q[i,:]) in line 64? It doesn't make any sense if only choosing the same st+1 to update each q_value for each <st, at, st+1, rt> in a minibatch.
# Apply Bellman Equation on batch samples to train our DDQN
q = self.agent.predict(s)
next_q = self.agent.predict(new_s)
q_targ = self.agent.target_predict(new_s)
for i in range(s.shape[0]):
old_q = q[i, a[i]]
if d[i]:
q[i, a[i]] = r[i]
else:
next_best_action = np.argmax(next_q[0,:]) # problematic, might be np.argmax(next_q[i,:])
q[i, a[i]] = r[i] + self.gamma * q_targ[i, next_best_action]
germain-hug commented
Well spotted! That is a mistake indeed, just corrected it, thanks for reporting it
eric-liuyd commented
Well spotted! That is a mistake indeed, just corrected it, thanks for reporting it
Yes, you're welcome