QWOP

This project aims to use deep reinforcement learning to play the game QWOP.

It is the first in a series of collaboration projects between PTStephD and Kirkados

The Algorithm

The core deep reinforcement learning algorithm is the Distributional Deep Q Learning algorithm, first presented by Bellmare et al. in 2017. A number of enhancements developed by other researchers are used as well. Namely:

Parallel actors and learners
N-step returns
Prioritized experience replay The algorithm is written in Tensorflow 1.15.

Special thanks to:

[msinto93] (https://github.com/msinto93)
[SuReLI] (https://github.com/SuReLI)
[DeepMind] (https://github.com/deepmind)
[OpenAI] (https://github.com/openai)

for publishing their codes! The open-source mindset of AI research is fantastic.

Results

Incentivizing the agent to run down the track (positive rewards are given for forward velocity): https://youtu.be/OYBiUWuA4Ho

Incentivizing the agent to run down the track AND perform front flips: https://youtu.be/16JEWNf6468

Usage

To run the training algorithm, edit settings.py and environment_qwop as appropriate, and then run python3 main.py from a terminal. The default parameters will cause the agent to run down the track, as shown in the above video. The code is CPU-intensive and takes days to train on a modern computer. In addition to python, the following python3 packages must be installed:

psutil pip3 install psutil
Tensorflow pip3 install tensorflow or pip3 install tensorflow-gpu for GPU compatibility (Additional steps required)
box2d pip3 install box2d-py
matplotlib pip3 install matplotlib
OpenAI gym pip3 install gym[all]
virtual display pip3 install pyvirtualdisplay The following linux packages must also be installed:
Opengl sudo apt-get install python-opengl
xvfb sudo apt-get install xvfb
ffmpeg sudo apt-get install ffmpeg

The Environment

A QWOP dynamics environment was developed from first principles and is contained in environment_qwop.py. It consists of a stick figure with a torso, two arms, and two legs. The goal is to press the buttons Q, W, O, and P to make the stick figure translate down the track as fast as possible.

Kirkados/QWOP

QWOP

The Algorithm

Results

Usage

The Environment