/MazeBase

Simple environment for creating very simple 2D games and training neural network models to perform tasks within them

Primary LanguageLuaOtherNOASSERTION

MazeBase: a sandbox for learning from games

This code is for a simple 2D game environment that can be used in developing reinforcement learning models. It is designed to be compact but flexible, enabling the implementation of diverse set of games. Furthermore, it offers precise tuning of the game difficulty, facilitating the construction of curricula to aid training. The code is in Lua+Torch, and it offers rapid prototyping of games and is easy to connect to models that control the agent’s behavior. For more details, see our paper.

Environment

Each game is played in a 2D rectangular grid. Each location in the grid can be empty, or may contain one or more items such as:

  • Block: an impassible obstacle that does not allow the agent to move to that grid location
  • Water: the agent may move to a grid location with water, but incurs an additional cost of for doing so.
  • Switch: a switch can be in one of M states, which we refer to as colors. The agent can toggle through the states cyclically by a toggle action when it is at the location of the switch .
  • Door: a door has a color, matched to a particular switch. The agent may only move to the door’s grid location if the state of the switch matches the state of the door.
  • PushableBlock: This block is impassable, but can be moved with a separate “push” actions. The block moves in the direction of the push, and the agent must be located adjacent to the block opposite the direction of the push.
  • Corner: This item simply marks a corner of the board.
  • Goal: depending on the game, one or more goals may exist, each named individually.
  • Info: these items do not have a grid location, but can specify a or give information necessary for its completion.

The environment is presented to the agent as a list of sentences, each describing an item in the game. For example, an agent might see “Block at [-1,4]. Switch at [+3,0] with blue color. Info: change switch to red.” However, note that we use egocentric spatial coordinates, meaning that the environment updates the locations of each object after an action. The environments are generated randomly with some distribution on the various items. For example, we usually specify a uniform distribution over height and width, and a percentage of wall blocks and water blocks.

Games

Currently, there are 10 different games implemented, but it is possible to add new games. The existing games are:

  • Multigoals: the agent is given an ordered list of goals as “Info”, and needs to visit the goals in that order.
  • Conditional Goals: the agent must visit a destination goal that is conditional on the state of a switch. The “Info” is of the form “go to goal 4 if the switch is colored red, else go to goal 2”.
  • Exclusion: the “Info” in this game specifies a list of goals to avoid. The agent should visit all other unmentioned goals.
  • Switches: there are multiple switches on the board, and the agent has to toggle all switches to the same color.
  • Light Key: there is a switch and a door in a wall of blocks. The agent should navigate to a goal which may be on the wrong side of a wall of blocks, in which the agent needs move to and toggle the switch to open the door before going to the goal.
  • Goto: the agent is given an absolute location on the grid as a target. The game ends when the agent visits this location. Solving this task requires the agent to convert from its own egocentric coordinate representation to absolute coordinates.
  • Goto Hidden: the agent is given a list of goals with absolute coordinates, and then is told to go to one of the goals by the goal’s name. The agent is not directly given the goal’s location, it must read this from the list of goal locations.
  • Push block: the agent needs to push a Pushable block so that it lays on top of a switch.
  • Push block cardinal: the agent needs to push a Pushable block so that it is on a specified edge of the maze, e.g. the left edge. Any location along the edge is acceptable.
  • Blocked door: the agent should navigate to a goal which may lie on the opposite side of a wall of blocks, as in the Light Key game. However, a PushableBlock blocks the gap in the wall instead of a door.

Examples of each games are shown in this video. The internal parameters of the games are written to a configuration file, which can be easily modified.

Using Game Environment

First, either install mazebase with luarocks make *.rockspec or add the appropriate path:

package.path = package.path .. ';lua/?/init.lua'

To use the game environment as standalone in Torch, first start a local display server with

$ th -ldisplay.start 8000 0.0.0.0

which will begin the remote desktop to view the MazeBase graphics at http://0.0.0.0:8000. See the full repo for more details. Next, include the init file with

g_mazebase = require('mazebase')

Then we have to set which config file to use. Here we are using the config file that used in our paper

g_opts = {games_config_path = 'mazebase/config/game_config.lua'}

Next, we call this function to create a dictionary with all necessary words used in the game

g_mazebase.init_vocab()            

Finally, initialize the game environment with

g_mazebase.init_game()

Now we can create a new game instance by calling

g = g_mazebase.new_game()

If there are more than one game, it will randomly pick one. Now, the current game state can be retrieved by calling

s = g:to_sentence()

which would return a tensor containing words (encoded by g_vocab dictionary) describing each item in the game. If you started the display server, you can see the game at 0.0.0.0:8000 on your browser by doing

g_disp = require('display')
g_disp.image(g.map:to_image())

sample_output

Next, an action can be performed by calling

g:act(action)

where action is the index of the action. The list of possible actions are in g.agent.action_names. When there are multiple agents in the game, we can choose the agent to perform the action by doing

g.agent = g.agents[i]

before calling g:act(). After the action is completed, g:update() must be called so that the game will update its internal state. Finally, we can check if the game is finished by calling g:is_active(). You can run demo_api.lua to see the game playing with random actions.

Creating a new game

Here we demonstrate how a new game can be added. Let us create a very simple game where an agent has to reach the goal. First, we create a file named SingleGoal.lua. In it, a game class has to be created

local SingleGoal, parent = torch.class('SingleGoal', 'MazeBase')

Next, we have to construct the game items. In this case, we only need a goal item placed at a random location:

function SingleGoal:__init(opts, vocab)
    parent.__init(self, opts, vocab)
    self:add_default_items()
    self.goal = self:place_item_rand({type = 'goal'})
end

Function place_item_rand puts the item on empty random location. But it is possible specify the location using place_item function. The argument to this function is a table containing item's properties such as type and name. Here, we only set the type of item to goal, but it is possible to include any number of attributes (e.g. color, name, etc.).

The game rule is to finish when the agent reaches the goal, which can be achieved by changing update function to

function SingleGoal:update()
    parent.update(self)
    if not self.finished then
        if self.goal.loc.y == self.agent.loc.y and self.goal.loc.x == self.agent.loc.x then
            self.finished = true
        end
    end
end

This will check if the agent's location is the same as the goal, and sets a flag when it is true. Finally, we have to give a proper reward when the goal is reached:

function SingleGoal:get_reward()
    if self.finished then
        return -self.costs.goal -- this will be set in config file
    else
        return parent.get_reward(self)
    end
end

Now, we include our game file in mazebase/init.lua by adding the following line

paths.dofile('SingleGoal.lua')

Also, the following lines has to be added inside init_game_opts function:

games.SingleGoal = SingleGoal
helpers.SingleGoal = OptsHelper

Finally, we need a config file for our new game. Let us create singlegoal.lua file in mazebase/config. The main parameters of the game is the grid size:

local mapH = torch.Tensor{5,5,5,10,1}
local mapW = torch.Tensor{5,5,5,10,1}

The first two numbers define lower and upper bounds of the parameter. The actual grid size will be uniformly sampled from this range. The remaining three numbers for curriculum training. In the easiest (hardest) case, the upper bound will be set to 3rd (4th) number. 5th number is the step size for changing the upper bound. In the same way, we define a percentage of grid cells to contain a block or water:

local blockspct = torch.Tensor{0,.05,0,.2,.01}
local waterpct = torch.Tensor{0,.05,0,.2,.01}

There are other generic parameters has be set, but see the actual config file for detail. Now we are ready to use the game!

Training an agent using neural networks

We also provide a code for training different types of neural models with policy gradient method. Training uses CPUs with multi-threading for speed up. The implemented models are

  1. multi-layer neural network
  2. convolutional neural network
  3. end-to-end memory network.

For example, running the following command will train a 2-layer network on MultiGoals.

th main.lua --hidsz 20 --model mlp --nlayers 2 --epochs 100 --game MultiGoals --nactions 6 --nworker 2

To see all the command line options, run

th main.lua -h
  --hidsz             the size of the internal state vector [20]
  --nonlin            non-linearity type: tanh | relu | none [tanh]
  --model             model type: mlp | conv | memnn [memnn]
  --init_std          STD of initial weights [0.2]
  --max_attributes    maximum number of attributes of each item [6]
  --nlayers           the number of layers in MLP [2]
  --convdim           the number of feature maps in convolutional layers [20]
  --conv_sz           spatial scope of the input to convnet and MLP [19]
  --memsize           size of the memory in MemNN [20]
  --nhop              the number of hops in MemNN [3]
  --nagents           the number of agents [1]
  --nactions          the number of agent actions [11]
  --max_steps         force to end the game after this many steps [20]
  --games_config_path configuration file for games [mazebase/config/game_config.lua]
  --game              can specify a single game []
  --optim             optimization method: rmsprop | sgd [rmsprop]
  --lrate             learning rate [0.001]
  --max_grad_norm     gradient clip value [0]
  --alpha             coefficient of baseline term in the cost function [0.03]
  --epochs            the number of training epochs [100]
  --nbatches          the number of mini-batches in one epoch [100]
  --batch_size        size of mini-batch (the number of parallel games) in each thread [32]
  --nworker           the number of threads used for training [16]
  --beta              parameter of RMSProp [0.97]
  --eps               parameter of RMSProp [1e-06]
  --save              file name to save the model []
  --load              file name to load the model []

See the paper for more details on training.

Testing a trained model

After training, you can see the model playing by calling function test() which will display the game in a browser window. But you have to have display package to see the game play. For example, if you saved a trained model using --save /tmp/model.t7 option, then you can load the model using option --load /tmp/model.t7 --epochs 0 and then run test() command to see it's playing.

Requirements

The whole code is written in Lua, and requires Torch7 and nngraph packages. The training uses multi-threading for speed up. Display package is necessary for visualizing the game play, which can be installed by

luarocks install display