Mistake in prioritised replay?
Khev opened this issue · 3 comments
Hello again,
FYI: think you might have defined the TD error wrong in the "Deep-RL-Keras/DDQN/ddqn.py". On line 125 you have
"""
q_val = self.agent.predict(new_state) ## I think the argument should be 'state' here
q_val_t = self.agent.target_predict(new_state)
next_best_action = np.argmax(q_val)
new_val = reward + self.gamma * q_val_t[0, next_best_action]
td_error = abs(new_val - q_val)[0]
"""
But I think the correct definition is
td_error = abs( Q(s,a) - yi )
with yi = ri + gamma*max( Q(s', a') )
Apologies for the late response! I believe you are correct, that was a typo indeed, just fixed the issue thanks for reporting it
I think there is still a problem with the latest code:
if(self.with_per):
q_val = self.agent.predict(state)
q_val_t = self.agent.target_predict(new_state)
next_best_action = np.argmax(q_val)
new_val = reward + self.gamma * q_val_t[0, next_best_action]
td_error = abs(new_val - q_val)[0]
"next_best_action = np.argmax(q_val)" should be "next_best_action = np.argmax(self.agent.predict(new_state))".
I think there is still a problem with the latest code:
if(self.with_per): q_val = self.agent.predict(state) q_val_t = self.agent.target_predict(new_state) next_best_action = np.argmax(q_val) new_val = reward + self.gamma * q_val_t[0, next_best_action] td_error = abs(new_val - q_val)[0]
"next_best_action = np.argmax(q_val)" should be "next_best_action = np.argmax(self.agent.predict(new_state))".
I think you are right.