Why did you need copyTargetQNetwork
fevemania opened this issue · 2 comments
fevemania commented
I have no idea about the meaning of copyTargetQNetwork. Why did we need QValueT to eval the QValue_batch? In order to let training process more stable ?
saselovejulie commented
i'm confuse about this function too,
if self.timeStep % UPDATE_TIME == 0:
as this code will transform QValue to QValueT every 100 steps, then why we need two of them?
FrankRouter commented
This is explained in the DQN nature paper.
We address these instabilities with a novel variant of Q-learning, which uses two key ideas. First, ... Second, we used an iterative update that adjusts the action-values (Q) towards target values that are only periodically updated, thereby reducing correlations with the target.