rmst/ddpg

bias annealing weight updates

zacwellmer opened this issue · 1 comments

I could be wrong but it does not seem that you are annealing the bias with important sampling as suggested in the paper(3.4).

w_i = (1/N * 1/P(i))^beta

I think you would have to multiply this w_i term with your gradients

My apologies, I thought you had included prioritized replay.