facebookresearch/OccupancyAnticipation

Mapper training during policy training

fraazor opened this issue · 2 comments

Hi there,

I have been working on a variation to the projection unit to add a different type of "fog of war" approach for the sensors. However I do not fully understand the code implementation because the mapper training and policy training seem to happen simultaneously. Wouldn't that lead to sorted, correlated data/label pairs in the supervised part? Is there some shuffling happening that I am missing?
Would really appreciate an answer to how this was approached.

We have a large replay buffer that stores data over multiple episodes. We randomly sample data from this buffer to break the correlation in data.

Thanks a lot. I might have to train them seperately since I have not enough memory for such a large replay buffer.