kenjyoung/MinAtar

deterministic versions of environments

Closed this issue · 3 comments

Hi,

I'd like to ask about how to make minatar environments deterministic for research on MBRL algorithms such as MuZero, which are designed for deterministic environments.

Apart from setting the sticky_action_prob to 0.0, is there anything I still need to change to get deterministic transitions (stochastic initialization is fine)?

It seems that for space_invaders and breakout, there is no stochasticity in the transition. Can you confirm? What would be your advice to make the other environments' transitions deterministic?

Thanks in advance for your answers.

Hey Jinkehe,

This is a good question. It would be tricky to do this for all environments in a satisfying way. In particular, Seaquest and Asterix involve randomized enemy spawning, you could technically make this deterministic by just fixing the random seed but that's not really ideal since to capitalize on this you would have to use a history-dependent model and just memorize the spawn sequence.

I believe Breakout should be fine without modification, the dynamics are already deterministic and fully observable.

Space_invaders is also deterministic, however it is partially observable due to the timing of enemy movement. If you are using a history-dependent model this shouldn't be an issue but if your model is Markov this would make the transitions appear stochastic. You may also want to disable difficulty ramping since that changes the timing of enemy movement each time a set of invaders is cleared adding further partial observability issues.

I believe Freeway could be made deterministic by simply not calling _randomize_cars when the agent reaches the finish. It would still be partially observable due to the timing of enemy movement just like space_invaders.

Hope this helps!

If full observability is important for your use case you could make space_invaders and freeway fully observable by adding a "clock" to the input and then syncing enemy movement steps to the clock. In space_invaders this would be straightforward since everything already moves at once. In freeway, it would be a bit more involved. You could include a separate clock for each of the 5 car speeds. Freeway also has an additional timer for player movement, so you could include another clock for that as well.

Thanks a lot for your quick reply. It was super helpful!
I didn't know that space_invader in its current form is partially observable.
For now I will stick to Breakout and will look into the others in the future if needed.