Why did you need copyTargetQNetwork

Question

Why did you need copyTargetQNetwork

fevemania opened this issue 8 years ago · 2 comments

I have no idea about the meaning of copyTargetQNetwork. Why did we need QValueT to eval the QValue_batch? In order to let training process more stable ?

Answer 1 · 2017-09-07T00:56:03.000Z

i'm confuse about this function too,

if self.timeStep % UPDATE_TIME == 0:
self.copyTargetQNetwork()

as this code will transform QValue to QValueT every 100 steps, then why we need two of them?

Answer 2 · 2018-07-14T03:14:07.000Z

This is explained in the DQN nature paper.

We address these instabilities with a novel variant of Q-learning, which uses two key ideas. First, ... Second, we used an iterative update that adjusts the action-values (Q) towards target values that are only periodically updated, thereby reducing correlations with the target.