facebookresearch/minihack

[BUG] Inconsistent environment seeding

jlin816 opened this issue ยท 2 comments

๐Ÿ› Bug

Seeding doesn't consistently generate the same environment.

To Reproduce

Steps to reproduce the behavior:

  1. Run this snippet repeatedly:
env = gym.make("MiniHack-KeyRoom-Fixed-S5-v0",
    observation_keys=("pixel", "colors", "chars", "glyphs", "tty_chars"),
    seeds=(42, 42, False))
env.seed(42, 42, False)
obs = env.reset()
env.render()
print(env.get_seeds())

Sometimes this prints

Hello Agent, welcome to NetHack!  You are a chaotic male human Rogue.           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       ----                                     
                                       |..|                                     
                                       +(.|                                     
                                    ----..|                                     
                                    |.....|                                     
                                    |...@.|                                     
                                    -------                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
Agent the Footpad              St:18/02 Dx:18 Co:13 In:8 Wi:9 Ch:7 Chaotic S:0  
Dlvl:1 $:0 HP:12(12) Pw:2(2) AC:7 Xp:1/0                                        
(42, 42, False)

But also occasionally prints (note the printed seeds are (0, 0, False)):

Hello Agent, welcome to NetHack!  You are a chaotic male human Rogue.           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       ----                                     
                                       |@.|                                     
                                       +..|                                     
                                       -..|                                     
                                        ..|                                     
                                        ..|                                     
                                       ----                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
Agent the Footpad              St:14 Dx:18 Co:14 In:11 Wi:11 Ch:8 Chaotic S:0   
Dlvl:1 $:0 HP:12(12) Pw:2(2) AC:7 Xp:1/0                                        
(0, 0, False)

Expected behavior

Same positions of agent/key, and same seeds being printed by env.get_seeds()

Environment


MiniHack version: 0.1.3+57ca418
NLE version: 0.8.1
Gym version: 0.21.0
PyTorch version: 1.11.0+cu102
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Ubuntu 20.04.3 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
CMake version: version 3.23.1

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3080
GPU 1: NVIDIA GeForce RTX 3080

Nvidia driver version: 510.47.03
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.11.0
[conda] torch                     1.11.0                   pypi_0    pypi

Hi @jlin816

Thanks for pointing this out and apologies for a late reply.

I see where the issue is. Firstly, MiniHack's seeding is slightly different from that of NLE's. The seeds argument in of a MiniHack environment assumes a list of integers which is used as a training distribution for an agent, e.g. [1, 3, 9, 27]. (Perhaps the documentation should make this clearer)

minihack/minihack/base.py

Lines 175 to 177 in 2054e7f

seeds (list or None):
A list of random seeds for sampling episodes. If none, the
entire level distribution is used. Defaults to None.

Specifically, when a reset function is called, minihack randomly samples one of the seeds, e.g. 27, (they are now stores as self._level_seeds since we treat them as levels of the same environment) and sets it using nle.seed(27, 27, False) like this

minihack/minihack/base.py

Lines 325 to 327 in 2054e7f

if self._level_seeds is not None:
seed = random.choice(self._level_seeds)
self.seed(seed, seed, reseed=False)

I understand this made the seed() function ignored if seeds was originally passed to the environment. Therefore, I added a new parameter in the reset() function of minihack called sample_seed (defaults to True). If True, the reset() function will randomly sample a level from the original list. If False, it will not do so, hence the manually setting the level seed with NLE's seed() function will work as desired.

Here is the PR #68. Please let me know if it works for you.

With this new PR, here is how one would use the seeding functionality

import minihack, gym
env = gym.make("MiniHack-KeyRoom-Fixed-S5-v0",
    observation_keys=("pixel", "colors", "chars", "glyphs", "tty_chars"),
    seeds=[1, 3, 9, 27])

For now let's sample random episodes a few times

obs = env.reset()
env.render()
print(env.get_seeds())

This outputs

You are lucky!  Full moon tonight.










                                          |
                                    ----..|
                                    |.....|
                                    |@.(..|
                                    -------






Agent the Footpad              St:13 Dx:17 Co:13 In:13 Wi:13 Ch:9 Chaotic S:0
Dlvl:1 $:0 HP:12(12) Pw:2(2) AC:7 Xp:1/0

(27, 27, False)

or perhaps

You are lucky!  Full moon tonight.








                                       ----
                                       |..|
                                       +..|
                                    ----..|
                                    |.....|
                                    |.(..@|
                                    -------






Agent the Footpad              St:14 Dx:18 Co:13 In:9 Wi:12 Ch:9 Chaotic S:0
Dlvl:1 $:0 HP:12(12) Pw:2(2) AC:7 Xp:1/0

(3, 3, False)

Now when we manually set the seed and use sample_seed=False, we will get the exact level we want

env.seed(42, 42, False)
obs = env.reset(sample_seed=False)
env.render()
print(env.get_seeds())

will result in the following.

You are lucky!  Full moon tonight.








                                       ----
                                       |..|
                                       +(.|
                                    ----..|
                                    |.....|
                                    |...@.|
                                    -------






Agent the Footpad              St:18/02 Dx:18 Co:13 In:8 Wi:9 Ch:7 Chaotic S:0
Dlvl:1 $:0 HP:12(12) Pw:2(2) AC:7 Xp:1/0

(42, 42, False)