Question: Did anyone solve MiniGrid-DoorKey-8x8-v0?

Question

Question: Did anyone solve MiniGrid-DoorKey-8x8-v0?

ErikVester opened this issue 3 years ago · 1 comments

Hi all,

Did anyone solve the MiniGrid-DoorKey-8x8-v0 environment with the PPO algorithm and if so, with which hyperparameters, environment steps and for how many frames did you run this?

Thanks! :)

Kind regards,

Erik

Answer 1 · 2022-10-04T01:48:46.000Z

I changed the default value of max_steps from 10 x size x size (in doorkey.py) to 100 x size x size, and it works fine. I also increased the number of frames to 800000, but it's obvious that it's learning long before it gets that far. The problem with the default settings is that with a room that large, most of the time it never makes it to the goal, which means there's no reward, so nothing is learned. The main thing is to let it keep trying for long enough in a single episode that it gets rewarded frequently enough to learn.