holderlb/WSU-SAILON-NG

Apparent bug in 0.7.8 with episode_seed

mmclure opened this issue · 5 comments

Hi, I believe we are encountering a bug that causes our vizdoom server to crash when running locally. From what I understand, the problem seems to be that the new episode_seed property on the TA2Logic class is initialized to None, and it eventually clobbers the random seed from our TA2 config, such that the None gets passed to the game (game.set_seed(self.seed) in the SailonViz class), which causes the game server to barf.

I'm not sure what the intended behavior is, but I've verified that if we call self.episode_seed = self._seed during the init of our TA2Agent class, right after we call super().__init__(), (to init the TA2Logic, where self.episode_seed gets set to None), then the seed from our config is what eventually gets passed to game.set_seed(self.seed).

That seems to give us a work-around for now, but obviously a fix, or some clarification as to why its not a bug and what we should be doing differently, would be great.

Thanks!

If it helps, here is a stack trace of the hard error we encountered, and below that are some notes from my tracing of the issue which led me to the work-around above.

ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m 2021-12-29 02:39:40,637 - objects.GENERATOR_logic.Generator - INFO - on_generator_request( {"obj_type": "start_generator", "domain": "vizdoom", "novelty": 200, "difficulty": "easy", "seed": 25853374, "server_rpc_queue": "rpc.server.v0.7.81a5ddac8f684e0e9d6f9bbf05bafd50", "trial_novelty": 200, "epoch": 1640745580.6338296, "day_offset": 0, "request_timeout": 590, "use_image": false, "generator_config": {"episode_seed": null, "start_zeroed_out": false, "start_world_state": null}} )
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m Exception in thread Thread-11:
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m Traceback (most recent call last):
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m   File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m     self.run()
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m   File "GENERATOR.py", line 69, in run
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m     ta2_generator_config=self.ta2_generator_config))
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m   File "/aiq-sail-on/env_generator/test_handler.py", line 40, in __init__
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m     ta2_generator_config=self.ta2_generator_config)
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m   File "/aiq-sail-on/env_generator/test_loader.py", line 89, in __init__
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m     self.load_test()
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m   File "/aiq-sail-on/env_generator/test_loader.py", line 142, in load_test
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m     self.seed, self.difficulty, path=self.path, use_gui=self.use_gui)
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m   File "/aiq-sail-on/env_generator/envs/vizdoom/viz.py", line 110, in __init__
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m     game.set_seed(self.seed)
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m TypeError: set_seed(): incompatible function arguments. The following argument types are supported:
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m     1. (self: vizdoom.vizdoom.DoomGame, arg0: int) -> None
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m Invoked with: <vizdoom.vizdoom.DoomGame object at 0x7f4f775b8030>, None
ESC[35mmmclure-vizdoom-mockn-gen    |ESC[0m
  • episode_seed property gets initialized to None here: [https://github.com/holderlb/WSU-SAILON-NG/blob/8c21281333c9d1a5410182cb8fb84f4c31051403/WSU-Portable-Generator/source/objects/TA2_logic.py#:~:text=self.episode_seed%20%3D%20None]
  • It gets added (as None) to the generator_config object here: [https://github.com/holderlb/WSU-SAILON-NG/blob/8c21281333c9d1a5410182cb8fb84f4c31051403/WSU-Portable-Generator/source/objects/TA2_logic.py#:~:text=generator_config%20%3D%20dict(%7B%27episode_seed%27%3A%20self.episode_seed%2C]
  • The generator_config gets passed around and eventually the episode_seed gets pulled from it here, and (even as None) it overwrites our random seed that we set in the config: [https://github.com/holderlb/WSU-SAILON-NG/blob/8c21281333c9d1a5410182cb8fb84f4c31051403/WSU-Portable-Generator/source/partial_env_generator/test_loader.py#:~:text=%23%20Set%20the%20custom,self.ta2_generator_config%5B%27episode_seed%27%5D]
  • The hard error occurs here, because the game doesn't like that set_seed gets called with None: [https://github.com/holderlb/WSU-SAILON-NG/blob/e90efa869c1fba4cffe141931a628186ec804c0e/WSU-Portable-Generator/source/partial_env_generator/envs/vizdoom/viz.py#:~:text=seed(self.seed)-,game.set_seed(self.seed),-%23%20Call%20this%20at]

Update: My work-around is not valid. It seems to seed every episode in a trial with the same seed, so initial conditions always look the same for games within an experiment. :/

Update follow-up (some elaboration): The reason we cared about running on the most recent vizdoom version is because we pre-can training data to bootstrap our novelty detector, and if there's any reason that our (wide-ranging) sensor values might shift with the latest version, we wanted that shift to be represented in the pre-canned data.

Since the work-around didn't work, we are now pre-canning data based on the version we ran with before. This is not a pressing problem if at least one of the following is true:

  1. There is no reason to think that any shift in sensor data is possible - basically if in the new version the games unfold the same way they did before, in expectation.
  2. You can re-evaluate our agent on the same version you evaluated us on before.

Thanks!

Getting started on fixing this, the episode_seed was rushed to get it handed out with the new novelties for the cartpole teams and I missed that interaction issue.

I'm running some tests to confirm it is working with the different permutations of configs. If you need a quick fix right now:

--- a/source/partial_env_generator/test_loader.py
+++ b/source/partial_env_generator/test_loader.py
@@ -36,7 +36,8 @@ class TestLoader:
         # Set the custom seed if provided.
         if self.ta2_generator_config is not None:
             if 'episode_seed' in self.ta2_generator_config:
-                self.seed = self.ta2_generator_config['episode_seed']
+                if self.ta2_generator_config['episode_seed'] is not None:
+                    self.seed = self.ta2_generator_config['episode_seed']
 
         # Determine level here
         self.use_mock = False

I am moving the 3 config settings from the TA2.py init function to the config file so everything is in the same place (instead of being spread between config files and class inits), as well as updating the READMEs.

Fantastic, thanks for the quick attention to this!