Arena: A Scalable and Configurable Benchmark for Policy Learning

Arena is a scalable and configurable benchmark for policy learning. It is an object-based game-like environment. The game logic is reminiscent of many classic games such as Pac-Man and Bomberman. An instance of the Arena benchmark starts with an arbitrarily sized region (i.e., the arena) containing a controllable agent as well as an arbitrary number of destructable obstacles, enemies, and collectable coins. The agent can move in four directions, fire projectiles, as well as place bombs. The goal is to control the agent to collect as many coins as possible in the shortest amount of time, potentially kill enemies and destroy obstacles using the projectiles and bombs along the way.

Installation

The main part of PGLE only requires the following dependencies:

numpy
pygame

Clone the repo and install with pip.

git clone https://github.com/Sirui-Xu/Arena.git
cd Arena/
pip install -e .

How to play these games yourself

cd examples/
python play.py

Use w, s, a, d to move, space to place bombs, and j to fire projectiles.

Getting started

Here's an example of importing Arena from the games library within Wrapper:

from arena import Arena

game = Arena(width=1280,
             height=720,
             object_size=32,
             obstacle_size=40,
             num_coins=50,
             num_enemies=50,
             num_bombs=3,
             explosion_max_step=100,
             explosion_radius=128,
             num_projectiles=3,
             num_obstacles=200,
             agent_speed=8,
             enemy_speed=8,
             p_change_direction=0.01,
             projectile_speed=32,
             visualize=True,
             reward_decay=0.99)

It's important to change the map size and the number of objects as a test for scalability.

Next we configure and initialize Wrapper:

from arena import Wrapper

p = Wrapper(game)
p.init()

You are free to use any agent with the Wrapper. Below we create a fictional agent and grab the valid actions:

myAgent = MyAgent(p.getActionSet())

We can now have our agent, with the help of Wrapper, interact with the game over a certain number of frames:

nb_frames = 1000
reward = 0.0

for f in range(nb_frames):
    action = myAgent.pickAction(reward, state)
    state, reward, game_over, info = p.step(action)
    if game_over: #check if the game is over
        state = p.reset()

Just like that we have our agent interacting with our game environment. A specific example can be referred to example/test.py

Test heuristic policy

cd example
python algorithm.py --algorithm ${algorithm_name} --store_data

${algorithm_name} should be something like random.

Train GNN policy

cd example
python train.py --dataset ${data_path} --checkpoints_path ${checkpoints_path} --model ${model_name}

Test GNN policy

cd example
python test.py --checkpoints_path ${checkpoints_path}

Train DQN agent (AX0)

cd examples/rl_dqgnn
python train_dqgnn.py --train --model_path ${path to save model} --num_episode 5000 --num_rewards 5

Visualizing DQN policy

cd examples/rl_dqgnn
python eval_dqgnn.py --train --model_path ${path to load model}

Acknowledgements

We referred to the PyGame Learning Environment for some of the implementations.