"global_step" in A3C
Emerald01 opened this issue · 2 comments
Hi,
I am confused about the "global_step" implementation at A3C class. It should be used to track the global step in the training loop, and is supervised by the Supervisor. However, I do not see how it can be updated in the source code. Maybe I missed some points.
Firstly, what does the following mean ? I think what it does is: local_step = global_step + state_dim, but why state_dim is needed to add on top of global_step?
inc_step = self.global_step.assign_add(tf.shape(pi.x))[0]
Secondly, I do not see any ops on global_step, in the train_op, inc_step is grouped together with apply_gradients() ops, I think what this means is that every call for train_op, inc_step will be increased by state_dim due to the code above, but what does this mean again? On the other hand, global_step has no update ops as far as I can see.
self.train_op = tf.group(opt.apply_gradients(grads_and_vars), inc_step)
However, this global_step is regarded as an operator, I do not see where it operates. Look to me it magically increments somewhere?
fetches = [self.train_op, self.global_step]
Hi,
My understanding is that global time step is how many frames you go over in the game. For a3c, the input of each x is every 4 frames, so each time when you run pi (policy network), it will calculate how many frames (global steps) the game run over. In this way, the global step increase and accumulated. global_step is a tensorflow variable. It is processed like this: https://www.tensorflow.org/api_docs/python/tf/get_variable.
Glad to discuss and if you think I'm wrong, feel free to let me know.
sounds right to me then. Thank you