Environment inconsistency bug when reset() is called twice at the end of an episode.
binary-husky opened this issue · 1 comments
binary-husky commented
We discover that when smac is reset twice at the end, the environment has unknown problems causing strange results, e.g. a model supposed to hit 95%+ win rate to reduce to 50%- win rate.
Method to reproduce:
- find a trained model on map MMM2, freeze it for evaluation
- change
res = self._env.reset()
toself._env.reset(); res=self._env.reset()
(reset twice) - observe significant win rate decline
Although we can easily avoid reset() twice by adding some if-else,
but this is obviously a bug that can cause protential troubles.
binary-husky commented
may be add some notice to warn others not to mess with reset function