
Learning-based MCTS in the Pommerman Environment

This repository provides an implementation of learning-based Monte-Carlo Tree Search variants in the Pommerman environment. Our approaches leverage opponent models (planning agents) to transform the multiplayer game into single- and two-player games depending on the provided settings.


The simplest way to get started and execute runs is to build a docker image and run it as a container.

Available backends:

  • TensorRT (NVIDIA GPU required): Tested with TensorRT 8.0.1 and PyTorch 1.9.0.


To use NVIDIA GPUs in docker containers, you have to install docker and nvidia-docker2. Have a look at the installation guide https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html.

Build Scripts

We provide small scripts to facilitate building the image and running experiments.

  1. Build the image

    $ bash docker/build.sh

    This automatically caches the dependencies. If you run it again, only the code is rebuilt. If you want to rebuild the whole image, just call bash docker/build.sh --no-cache.

  2. Specify where you want to store the data generated by the experiments as environment variable $POMMER_DATA_DIR. You can export POMMER_DATA_DIR=/some/dir or just add POMMER_DATA_DIR=/some/dir as a prefix to the command in the following step.

  3. Create a container and run the training loop (replace --help with the arguments of your choice)

    $ bash docker/run.sh --help
    • Note that --dir and --exec are already specified correctly by docker/run.sh.
    • All GPUs are visible in the container and gpu 0 is used by default. You can specify the gpu to be used like --gpu 4.

Manual Docker Build

Of course, you can also build and run the image manually. Have a closer look at the scripts from the previous section for details.

Additional notes:

  • You can limit the gpu access of a container like --gpus device=4. However, PommerLearn has a --gpu argument that can be used instead.
  • Warning: If you use rootless docker, the container will probably run out of memory. Adding --ipc=host or --shm-size=32g to the docker run command helps. This is also done by default in docker/run.sh.


Search Approaches

  1. Generate an SL dataset with 1 million samples with

    $POMMER_EXEC --mode=ffa_sl --max-games=-1 --chunk-size=1000 --chunk-count=1000 --log --file-prefix=./1M_simple

    where $POMMER_EXEC can be your PommerLearn executable or MODE=exec bash docker/run.sh

  2. Train SL model: Run pommerlearn/training/train_cnn.py with the following modified arguments (see bottom of the file)

    "dataset_path": "1M_simple_0.zr",
    "test_size": 0.01,
    "output_dir": "./model-sl"

    and save it as $POMMER_DATA_DIR/model-sl

  3. Generate a dummy model by running pommerlearn/debug/create_dummy_model.py and save it as $POMMER_DATA_DIR/model-dummy

  4. You can now perform search experiments with both models. Use POMMER_1VS1=false MODE=exec bash run.sh for the single-player search and POMMER_1VS1=true MODE=exec bash run.shgit for the two-player search.

  5. To reproduce our results, you can generate 5 sl and dummy models labeled with the respective suffix -0 to -4. Navigate into the docker directory and run the search experiments with

    ./docker $ bash search_experiments.sh

    The results will be recorded in a single csv file.

Reinforcement Learning

Navigate into the docker directory and run the rl experiments with

./docker $ bash rl_experiments.sh

This will create a new directory in your working directory to store the training logs. You will find the results in your $POMMER_DATA_DIR/archive and the tensorboard runs in $POMMER_DATA_DIR/runs.

Team Mode Experiments

To perform experiments in the team mode, you can collect samples with the option --mode=team_sl and otherwise proceed like in the FFA mode, e.g.

$POMMER_EXEC --mode=team_sl --max-games=-1 --chunk-size=1000 --chunk-count=1000 --log --file-prefix=./1M_simple_team

You can then run pommerlearn/training/train_cnn.py on the generated data set, this will automatically use the value targets for the team mode due to the meta information in the data set.


Manual Installation of Dependencies

For the python side:

  • python 3.7 and pip

    It is recommend to use virtual environments. This guide will use Anaconda. Create an environment named pommer with

    $ conda create -n pommer python=3.7

For the C++ side:

  • Essential build tools: gcc, make, cmake

    $ sudo apt install build-essential cmake
  • The dependencies z5, xtensor, boost and json by nlohmann can directly be installed with conda in the pommer environment:

    (pommer) $ conda install -c conda-forge z5py xtensor boost nlohmann_json blosc
  • Blaze needs to be installed manually. Note that it can be unpacked anywhere, it does not have to be /usr/local. For further information, you can refer to the installation guide or the Dockerfiles in this repository.

    cmake -DCMAKE_INSTALL_PREFIX=/usr/local/
    sudo make install
    export BLAZE_PATH=/usr/local/include/
  • Manual installation of TensorRT (not Torch-TensorRT), including CUDA and cuDNN. Please refer to the installation guide by NVIDIA https://developer.nvidia.com/tensorrt-getting-started.

Clone Repository

This repository depends on submodules. Clone it and initialize all submodules with

$ git clone git@gitlab.com:jweil/PommerLearn.git && \
$ cd PommerLearn && \
$ git submodule update --init

Build Instructions

  1. The current version requires you to set the env variables

    • CONDA_ENV_PATH: path of your conda environment (e.g. ~/conda/envs/pommer)
    • BLAZE_PATH: blaze installation path (e.g. /usr/local/include)
    • CUDA_PATH: cuda installation path (e.g. /usr/local/cuda)
    • TENSORRT_PATH (when using the CrazyAra TensorRT backend, e.g. /usr/src/tensorrt)
    • [Torch_DIR] (when using the CrazyAra Torch backend, currently untested)
  2. Build the C++ environment with the provided CMakeLists.txt. To use TensorRT >= 8 (recommended), you have to specify -DUSE_TENSORRT8=ON.

/PommerLearn/build $ cmake -DCMAKE_BUILD_TYPE=Release -DUSE_TENSORRT8=ON -DCMAKE_CXX_COMPILER="$(which g++)" ..
/PommerLearn/build $ make VERBOSE=1 all -j8

Run Instructions

Optional: You can install PyTorch 1.9.0 with GPU support via

conda install -y pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.1 -c conda-forge -c pytorch

The remaining python runtime dependencies can be installed with

(pommer) $ pip install -r requirements.txt

Before starting the RL loop, you can check whether everything is set up correctly by creating a dummy model and loading it in the cpp executable:

(pommer) /PommerLearn/build $ python ../pommerlearn/debug/create_dummy_model.py
(pommer) /PommerLearn/build $ ./PommerLearn --mode=ffa_mcts --model=./model/onnx

You can then start training by running

(pommer) /PommerLearn/build $ python ../pommerlearn/training/rl_loop.py


Prerequisites and Building

  • Make sure that you've pulled all submodules recursively
  • In older versions of TensorRT, you have to manually comment out using namespace sample; in deps/CrazyAra/engine/src/nn/tensorrtapi.cpp
  • We experienced issues with std::filesystem being undefined when using GCC 7.5.0. We recommend to update to more recent versions, e.g. GCC 11.2.0.


  • For runtime issues like libstdc++.so.6: version 'GLIBCXX_3.4.30' not found, try loading your system libraries with export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/. On some systems, ctypes somehow uses a different libstdc++ from the conda environment instead of the correct lib path. As a last resort, you can back up the original library mv /conda-lib-path/libstdc++.so.6 /conda-lib-path/libstdc++.so.6.old and then create a symbolic link ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /conda-lib-path/libstdc++.so.6.
  • If you encounter errors like ModuleNotFoundError: No module named 'training', set your PYTHONPATH to the pommerlearn directory. For example, export PYTHONPATH=/PommerLearn/pommerlearn.
  • When loading tensorboard runs, you can get errors like Error: tonic::transport::Error(Transport, hyper::Error(Accept, Os { code: 24, kind: Other, message: "Too many open files" })). The argument --load_fast=false might help.

Performance Profiling

You can install the plotting utility for gprof: https://github.com/jrfonseca/gprof2dot

Activate the CMake option USE_PROFILING in CMakeLists.txt and rebuild. Run the executable and generate the plot:

./PommerLearn --mode ffa_mcts --max_games 10
gprof PommerLearn | gprof2dot | dot -Tpng -o profile.png


If you find this repository helpful, please consider citing our paper

  author={Weil, Jannis and Czech, Johannes and Meuser, Tobias and Kersting, Kristian},
  title={{Know your Enemy: Investigating Monte-Carlo Tree Search with Opponent Models in Pommerman}},
