
Multi-agent reinforcement learning environment

Primary LanguageC++MIT LicenseMIT


Gym-Battlesnake is a multi-agent reinforcement learning environment inspired by the annual Battlesnake event held in Victoria, BC each year, and conforming to the OpenAI Gym interface.

Alt Text


  • Multi-threaded game implementation written in fast C++
  • Single agent training with multiple other agents as opponents
  • Render mode available to see your agents play



Gym-Battlesnake has only been tested on Ubuntu 18.04. Install the dependencies using the command:

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev libsfml-dev

You will also need to install tensorflow or tensorflow-gpu (2.0 not tested), see https://www.tensorflow.org/install/pip.

Install using Pip

Clone this repository using the following command:

git clone https://github.com/ArthurFirmino/gym-battlesnake

Change into the directory and install using pip (consider setting up a Python virtual environment first):

cd gym-battlesnake
pip install -e .


Single agent training:

from gym_battlesnake.gymbattlesnake import BattlesnakeEnv
from gym_battlesnake.custompolicy import CustomPolicy
from stable_baselines import PPO2

env = BattlesnakeEnv(n_threads=4, n_envs=16)

model = PPO2(CustomPolicy, env, verbose=1, learning_rate=1e-3)

del model
model = PPO2.load('ppo2_trainedmodel')

obs = env.reset()
for _ in range(10000):
    action,_ = model.predict(obs)
    obs,_,_,_ = env.step(action)

Multi agent training:

from gym_battlesnake.gymbattlesnake import BattlesnakeEnv
from gym_battlesnake.custompolicy import CustomPolicy
from stable_baselines import PPO2

num_agents = 4
placeholder_env = BattlesnakeEnv(n_threads=4, n_envs=16)
models = [PPO2(CustomPolicy, placeholder_env, verbose=1, learning_rate=1e-3) for _ in range(num_agents)]

for _ in range(10):
    for model in models:
        env = BattlesnakeEnv(n_threads=4, n_envs=16, opponents=[ m for m in models if m is not model])

model = models[0]
env = BattlesnakeEnv(n_threads=1, n_envs=1, opponents=[ m for m in models if m is not model])
obs = env.reset()
for _ in range(10000):
    action,_ = model.predict(obs)
    obs,_,_,_ = env.step(action)


  • See OpenAI documentation on gym for more details about its interface
  • See stable-baselines documentation for more details on their PPO2 implementation and other suitable algorithms
  • For multi-agent training tensorflow-gpu is recommended, as well as a large number of environments (~100) to maximize data transfer to the GPU.


  1. Fork
  2. Clone and Setup
  3. Develop
  4. Pull Request