Questions about Experience Replay Buffer
kkhetarpal opened this issue · 1 comments
Hi @miyosuda
Thanks again for the open-source code implementation. It is of great help.
I had a doubt on the way experience replay buffer is being filled.
In the main.py the when the top level process is called for each environment,
diff_global_t = trainer.process(self.sess,
self.global_t,
self.summary_writer,
self.summary_op,
self.score_input)
the replay buffer is being filled here in the below lines of code since the experience will not be full at the start- Am I correct?.
# Fill experience replay buffer
if not self.experience.is_full():
self._fill_experience(sess)
return 0
Then inside the base A3C process, we keep adding the new frames in the below lines:
frame = ExperienceFrame(prev_state, reward, action, terminal, pixel_change,
last_action, last_reward)
# Store to experience
self.experience.add_frame(frame)
So just to confirm, the _process_base function will control what goes to the experience replay always, is this understanding correct of the implementation? Although, at first instance, the auxiliary tasks (VR, RP, PC) use the experience frames from the foremost filling which happened outside the base process? Is this correct? Am I missing something?
Thank you for your time in clarification on these doubts.
Resolved