openai/procgen

Agent acts non-deterministically

hfeniser opened this issue · 0 comments

I trained an agent in Maze environment of Procgen benchmark. Now. I am testing it on various game levels. However, I noticed that the agent act non-deterministically. For example, I set a game by specifying num_levels=1 and start_level=97. I get the following sequence of actions taken by the agent in two different runs:

1st play: [7] [8] [5] [5] [5] [5] [2] [5] [2] [5] [5] [8] [8] [5] [5] [5]
2nd play: [8] [8] [5] [5] [5] [5] [2] [2] [5] [5] [8] [8] [5] [5] [5]

Note that the agent is able to get the cheese in every run, although it takes different actions in some steps.