This project aims to use deep reinforcement learning to play the game QWOP.
It is the first in a series of collaboration projects between PTStephD and Kirkados
The core deep reinforcement learning algorithm is the Distributional Deep Q Learning algorithm, first presented by Bellmare et al. in 2017. A number of enhancements developed by other researchers are used as well. Namely:
- Parallel actors and learners
- N-step returns
- Prioritized experience replay The algorithm is written in Tensorflow 1.15.
Special thanks to:
- [msinto93] (https://github.com/msinto93)
- [SuReLI] (https://github.com/SuReLI)
- [DeepMind] (https://github.com/deepmind)
- [OpenAI] (https://github.com/openai)
for publishing their codes! The open-source mindset of AI research is fantastic.
Incentivizing the agent to run down the track (positive rewards are given for forward velocity): https://youtu.be/OYBiUWuA4Ho
Incentivizing the agent to run down the track AND perform front flips: https://youtu.be/16JEWNf6468
To run the training algorithm, edit settings.py
and environment_qwop
as appropriate, and then run
python3 main.py
from a terminal. The default parameters will cause the agent to run down the track, as shown in the above video. The code is CPU-intensive and takes days to train on a modern computer.
In addition to python, the following python3 packages must be installed:
- psutil
pip3 install psutil
- Tensorflow
pip3 install tensorflow
orpip3 install tensorflow-gpu
for GPU compatibility (Additional steps required) - box2d
pip3 install box2d-py
- matplotlib
pip3 install matplotlib
- OpenAI gym
pip3 install gym[all]
- virtual display
pip3 install pyvirtualdisplay
The following linux packages must also be installed: - Opengl
sudo apt-get install python-opengl
- xvfb
sudo apt-get install xvfb
- ffmpeg
sudo apt-get install ffmpeg
A QWOP dynamics environment was developed from first principles and is contained in environment_qwop.py
. It consists of a stick figure with a torso, two arms, and two legs. The goal is to press the buttons Q
, W
, O
, and P
to make the stick figure translate down the track as fast as possible.