Port for Minigrid Environments
dimitrisTim opened this issue · 3 comments
Hello and congrats for the nice paper!
I am working on my master's thesis trying out your code as my codebase. However, I integrated and use the Minigrid environments instead of the custom mazes you created.
Now, if I understand correctly, the concept of "Memory" refers to the amount of timesteps our policy is fed with data (for each batch).
If I use a Minigrid environment, should the code theoretically work as is?
If not, what changes should I do?
Thanks in advance!
Hi,
Sorry I did not see this issue in time. I discussed MiniGrid-Memory environment in the paper, so the (optimal) policy memory length can be quite short.
To clarify, the amount of timesteps our policy is fed with data
this is the policy context length, not necessarily the memory length, because an RNN policy may be hard to capture distant information even if the policy saw it.
The code for T-Maze needs some adjustment for Minigrid: a more expressive observation encoder (e.g., CNN), the policy context length might be shortened (the current code uses the full episode length which may be too slow for minigrid).
Let me know if you have further questions.
Thanks a lot for your repy!
What I am working on for my master's thesis is the effects of using the GPT2 as encoder (instead of an LSTM or RNN) together with self-predictive representations (your other project minZP).
So what I am doing right now is feeding the GPT2 encoder with the history and learning oservations and studying the effects of learning efficiency and generalization.
What I was wondering was also if I should train with random samples from the R2D2 buffer or whole episodes like in your MemoryRL project. But I guess I will do an abletion for both cases.
If you are interested on the results/thesis topic we could also have a brief discussion about it if you like :)
What I was wondering was also if I should train with random samples from the R2D2 buffer or whole episodes like in your MemoryRL project. But I guess I will do an abletion for both cases.
Yes, R2D2 buffer will be useful if you do not train whole episodes.
I am interested; could you send me an email to have a brief meeting?