floodsung/DRL-FlappyBird

Why did you need copyTargetQNetwork

fevemania opened this issue · 2 comments

I have no idea about the meaning of copyTargetQNetwork. Why did we need QValueT to eval the QValue_batch? In order to let training process more stable ?

i'm confuse about this function too,

if self.timeStep % UPDATE_TIME == 0:
self.copyTargetQNetwork()

as this code will transform QValue to QValueT every 100 steps, then why we need two of them?

This is explained in the DQN nature paper.

We address these instabilities with a novel variant of Q-learning, which uses two key ideas. First, ... Second, we used an iterative update that adjusts the action-values (Q) towards target values that are only periodically updated, thereby reducing correlations with the target.