Implementation of OpenAI Gym's LunarLander-v2 for the AE4311 Advanced Flight Control course at TU Delft.
Before training | After training (Sarsa/Q-learning) | After training (double DQN) |
---|---|---|
Tested on Ubuntu 18.04, with Python 3.6.5. Ensure all Ubuntu repositories are enabled (can be done with sudo add-apt-repository universe restricted multiverse
).
$ sudo apt install -y python3-numpy python3-dev cmake zlib1g-dev libjpeg-dev xvfb xorg-dev python3-opengl libboost-all-dev libsdl2-dev swig
$ git clone https://github.com/Huizerd/lunarlander.git
$ cd lunarlander
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
To run the default configuration:
$ python -m lander
Of course, other configurations can be specified. See config/config.yaml.default
for the default values.
An example configuration, saved under config/config.yaml
, would be:
# Environment and agent
ENV_ID: 'LunarLander-v2'
ENV_SEED: 0
AGENT: 'doubledqn'
# Data locations
# NOTE: setting RECORD_DIR to an existing directory will overwrite!
# NOTE: CHECKPOINT_DIR can be anything when CONTINUE is False
RECORD_DIR: 'record/doubledqn/'
CHECKPOINT_DIR: ''
# Run config
EPISODES: 1000
SAVE_EVERY: 100
STATE_BINS: [5, 5, 5, 5, 5, 5, 2, 2] # per state dimension
STATE_BOUNDS: [[-1.0, 1.0], [-1.0, 1.0], [-1.0, 1.0], [-1.0, 1.0],
[-1.0, 1.0], [-1.0, 1.0], [-1.0, 1.0], [-1.0, 1.0]] # per state dimension
VERBOSE: 1 # 0: nothing, 1: plots and saved videos, 2: every episode
CONTINUE: False
# Learning parameters
# Format: start value, end value, # of episodes, decay factor
# NOTE: linear slope between start and end, then exponential decay
# NOTE: use 0 as # of episodes to do only exponential decay
E_GREEDY: [1.0, 0.05, 1e5, 0.97]
LEARNING_RATE: [5e-5, 5e-5, 0, 1]
DISCOUNT_RATE: 0.99
# Network parameters
L2_REG: 1e-6
UPDATE_EVERY: 100
REPLAY_MEMORY_SIZE: 1e6
BATCH_SIZE: 64
LAYER_SIZES: [512, 512, 512]
Which would then be called like this:
$ python -m lander -c config/config.yaml
As of now, the available agents are:
- Random
- Sarsa
- Q-learning
- Double deep Q-learning
Additional information about the environment can be found on the environment's webpage, or in the source code.
There are four discrete actions the lander can take:
0
: Do nothing1
: Fire left thruster2
: Fire main thruster3
: Fire right thruster
The state vector consists of eight variables (in this order) between -1 and 1:
- Lander position in x
- Lander position in y
- Lander velocity in x
- Lander velocity in y
- Lander angle
- Lander angular velocity
- Contact left landing leg
- Contact right landing leg
To make the learning problem (more) tractable, the state can be discretized into a certain number of bins.
A grid search over different hyperparameter combinations can be performed using grid_search.py
:
$ python lander/grid_search.py
which uses the default configuration and grid. Custom variants can be called like this:
$ python lander/grid_search.py -c config/config.yaml -g grid/grid_search.yaml
The goal of performing a grid search is twofold: finding the best hyperparameters for the problem, and performing a sensitvity analysis of them.