Mistake in prioritised replay?

Question

Mistake in prioritised replay?

Khev opened this issue 6 years ago · 3 comments

Hello again,

FYI: think you might have defined the TD error wrong in the "Deep-RL-Keras/DDQN/ddqn.py". On line 125 you have

"""
q_val = self.agent.predict(new_state) ## I think the argument should be 'state' here
q_val_t = self.agent.target_predict(new_state)
next_best_action = np.argmax(q_val)
new_val = reward + self.gamma * q_val_t[0, next_best_action]
td_error = abs(new_val - q_val)[0]
"""

But I think the correct definition is

td_error = abs( Q(s,a) - yi )
with yi = ri + gamma*max( Q(s', a') )

Answer 1 · 2018-11-19T15:47:04.000Z

Apologies for the late response! I believe you are correct, that was a typo indeed, just fixed the issue thanks for reporting it

Answer 2 · 2019-08-12T11:01:33.000Z

I think there is still a problem with the latest code:

        if(self.with_per):
            q_val = self.agent.predict(state)
            q_val_t = self.agent.target_predict(new_state)
            next_best_action = np.argmax(q_val)
            new_val = reward + self.gamma * q_val_t[0, next_best_action]
            td_error = abs(new_val - q_val)[0]

"next_best_action = np.argmax(q_val)" should be "next_best_action = np.argmax(self.agent.predict(new_state))".

Answer 3 · 2019-11-03T09:01:14.000Z

I think there is still a problem with the latest code:
        if(self.with_per):
            q_val = self.agent.predict(state)
            q_val_t = self.agent.target_predict(new_state)
            next_best_action = np.argmax(q_val)
            new_val = reward + self.gamma * q_val_t[0, next_best_action]
            td_error = abs(new_val - q_val)[0]
"next_best_action = np.argmax(q_val)" should be "next_best_action = np.argmax(self.agent.predict(new_state))".

I think you are right.