DDQN.py function memorize: incorrect Q values ?

Question

DDQN.py function memorize: incorrect Q values ?

Closed this issue 4 years ago · 0 comments

It seems like if I compare from [https://arxiv.org/pdf/1511.05952.pdf](PER paper):

Algorithm 1: line 11
TD error
delta(j) = Reward(j) + gamma(j) * Q_target(S_j, arg max_a Q(S_j, a)) - Q(S_j-1, A_j-1)

If I am not mistaken the j-1 subscript refers to current state in the implementation, i.e. state, action, reward, done all refer to j-1 . And new_state refers to j

Then line 125 in ddqn.py refers to arg max of itself not to the previous one:
q_val = self.agent.predict(state)
next_best_action = np.argmax(q_val)

should be

q_val = self.agent.predict(new_state)
next_best_action = np.argmax(q_val)