Mapper training during policy training

Question

Mapper training during policy training

fraazor opened this issue 3 years ago · 2 comments

Hi there,

I have been working on a variation to the projection unit to add a different type of "fog of war" approach for the sensors. However I do not fully understand the code implementation because the mapper training and policy training seem to happen simultaneously. Wouldn't that lead to sorted, correlated data/label pairs in the supervised part? Is there some shuffling happening that I am missing?
Would really appreciate an answer to how this was approached.

Answer 1 · 2021-05-24T17:59:53.000Z

We have a large replay buffer that stores data over multiple episodes. We randomly sample data from this buffer to break the correlation in data.

Answer 2 · 2021-05-26T19:03:07.000Z

Thanks a lot. I might have to train them seperately since I have not enough memory for such a large replay buffer.