lcswillems/rl-starter-files

the initial memory of chunks

rizar opened this issue · 4 comments

rizar commented

I have a question about this implementation, in order to ask which I guess I have to introduce a bit of terminology. At each PPO-step we use --procs processes to produce a rollout of --frames-per-proc steps. All these rollouts are then concatenated. Several epochs of optimization are then performed. At each epoch we split the concatenated rollouts in chunks of --recurrence steps. For each such chunk we initialize the memory LSTM with the values remembered from the rollout stage or with values from the previous PPO epoch.

My question is as follows. In this line we update the memory state for the next epoch. We don't however perform an update when i == self.recurrence - 1. That means that some of the memories in exps.memory, in particular the ones at indices 0, self.recurrence, 2 * self.recurrence, will be stale.

Is that correct? Perhaps memory should not be updated at all in PPO?

I am not sure to understand "will be stale" correctly (french translations don't make sense for me). Do you mean that it will not be updated and therefore depreciated?

There is also the shift I do (of self.recurrence // 2) at for half epochs. I don't know if it helps.

rizar commented

Thank you for the translation!

If there is no shift, yes, you are right, the ones at indices 0, self.recurrence, ... will be stable but I introduced the shift to overcome this issue... But this is maybe not a good idea. I did some experiments with and without the the shift, I got better results with.

Actually, I don't have the time to test if the memory is not working because of this. But, if you have a question, I will answer with pleasure and as fast as possible! From my point of view, I think there is also a problem with the memory but I spent so many time debugging and didn't find a way to get better results...

I think that a solution could be to find another implementation with the memory but I didn't succeed to find another one... Have you already used a working implementation of a memory?

rizar commented