lcswillems/torch-ac

ParallelEnv class yields non-correct rewards in a minigrid environment

ycemsubakan opened this issue · 3 comments

I tried to use the parallelenv class for creating parallel episodes. I used this minigrid environment: https://github.com/maximecb/gym-minigrid/blob/master/README.md (with MiniGrid-Empty-5x5-v0) The rewards should be (1 - c*time_taken_toreachgreen) (where c is a constant), but it seems when I use the parallelenv , rewards do not follow this. I am actually observing that the rewards increase with time.
Example: Say we have 10 step episodes. Normally we should be observing this type of rewards:
[0, 0, 0.95, 0, 0, 0.9, 0, 0, 0.85, 0]
(this is a list where the first element is the reward obtained at t=0, second element is the reward at t=1, and so on. )
But, I am observing rewards like this with ParallelEnv():
[0, 0, 0.95, 0, 0, 0.95, 0, 0, 0.95, 0], or even increasing rewards like the following :
[0, 0, 0.85, 0, 0, 0.90, 0, 0, 0.95, 0]

I might be misunderstanding the purpose of the ParallelEnv class: My understanding was that it is supposed to give totally independent episodes, and it shouldn't disrupt the original reward structure? It would be great if you could let me know how I could fix this. Thank you!

ParallelEnv just runs agent on environments in parallel. I don't see the link with reward. Could you say me how to reproduce the bug?

I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)

I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)

Hello, did you manage to fix the bug? Because I am currently trying to test the same thing