tambetm/simple_dqn

What's the purpose of _restartRandom?

Closed this issue · 5 comments

In agent.py:

def _restartRandom(self):
    self.env.restart()
    # perform random number of dummy actions to produce more stochastic games
    for i in xrange(random.randint(self.history_length, self.random_starts) + 1):
      reward = self.env.act(0)
      terminal = self.env.isTerminal()
      if terminal:
          self.env.restart()
      screen = self.env.getScreen()
      # add dummy states to buffer
      self.buf.add(screen)

I can't understand self.env.act(0), the action 0 is noop, so when we start a new game, and don't take a action 1(fire), the screen keeps the same, because we take noop action. So what is the purpose of _restartRandom?

Atari didn't have timer chip and therefore is completely deterministic. Usually Atari games rely on user actions to acquire randomness. With learning algorithm it is therefore completely possible, that the algorithm just learns to perform one specific sequence of actions that consistently achieves good result. This is not what we expect, we want the algorithm to work in variety of situations. For this reason we introduce artificial randomness into game by varying the number of start actions. More recent variations in this theme include starting from human starts and also inducing randomness into action repeat.

I still can't understand why we vary the number of start actions so that we can introduce artificial randomness into game. I think when we start a new game, the screen keeps the same, so the randomness depends on what action we take completely. Maybe I have little knowledge about Atari emulator. do you mean randomness is position the bullet appears when we take fire action? And what is the relationship between randomness and the number of start action.

I think it depends on each game how it handles randomness. For example I can imagine Breakout keeping sum of all joystick codes from start of the game and decides if initially ball flies left or right based on 0th bit of the sum. If the code for NOOP would be 1, then depending on the number of initial NOOP actions the sum may turn out even or odd. Of course I have no idea how Breakout actually does it and code for NOOP probably isn't 1, but you get the general idea. This was the scheme used in original DQN paper.

Of course injecting randomness into beginning might not be enough and that's why more recently ALE adds randomness to action repeat.

I have ran some test on breakout. The Atari have 18 actions, from action 0 to 17. List below:
NOOP, FIRE, UP, RIGHT, LEFT, DOWN, UPRIGHT, UPLEFT, DOWNRIGHT , DOWNLEFT, UPFIRE, RIGHTFIRE, LEFTFIRE, DOWNFIRE, UPRIGHTFIRE, UPLEFTFIRE, DOWNRIGHTFIRE, DOWNLEFTFIRE

And the breakout has 4 actions, NOOP, FIRE, LEFT, RIGHT, and when we start games, there is no bullet(or ball) on the screen, you should take a action FIRE to generate a bullet on the any screen position randomly. And bullet will go towards the lower left or lower right. And when the board touches the bullet, the bullet will be reflected according to the law of reflection.
That's why I say when we start a new game, and don't take a FIRE action, the action id is 1, the screen keeps same, so _restartRandom takes some NOOP actions when we start games, the screen don't change. So I don't understand the purpose of your function _restartRandom .
I don't use your _restartRandom, but I find the my dqn can't learn well. I just set skip_frame = 4, but I see people use action_repeat to instead of skip_frame.
I still wonder why action repeat can add randomness to game. And don't use action repeat, this will cause dqn learn nothing?

  1. If you take NOOP actions the screen doesn't change, but some internal counter inside Breakout might.
  2. I you don't use _restartRandom, it shouldn't prevent it from learning. Rather it should make it easier to learn, but the result possibly doesn't generalize to novel situations.
  3. Action repeat by itself doesn't introduce randomness of course. But recent ALE versions vary the action repeat, i.e. if you want to repeat action 4 times, it actually repeats it 3-5 times. See section 7.5 in ALE manual.