oxwhirl/smac

Environment inconsistency bug when reset() is called twice at the end of an episode.

binary-husky opened this issue · 1 comments

We discover that when smac is reset twice at the end, the environment has unknown problems causing strange results, e.g. a model supposed to hit 95%+ win rate to reduce to 50%- win rate.

Method to reproduce:

  • find a trained model on map MMM2, freeze it for evaluation
  • change res = self._env.reset() to self._env.reset(); res=self._env.reset() (reset twice)
  • observe significant win rate decline

Although we can easily avoid reset() twice by adding some if-else,
but this is obviously a bug that can cause protential troubles.

may be add some notice to warn others not to mess with reset function