
AI learning from visual input using ViZDoom environment.

Primary LanguagePython

AI learning from raw visual input using ViZDoom environment with Theano and Lasagne.

The code implements Double DQN with Duelling architecture:

Some videos with early results (no double/duelling and bugs): https://www.youtube.com/watch?v=re6hkcTWVUY


The code requires vizdoom.so and vizdoom to be present in the root directory. Config files and scenarios are also needed (can be found in the ViZDoom repo).

Usage of the learning script

usage: learn.py [-h] [--load-agent <AGENT_FILE>] [--list]
                [--load-json <JSON_FILE>] [--config-file <CONFIG_FILE>]
                [--name <NAME>] [--no-save] [--no-save-results]
                [--no-save-best] [--epochs <EPOCHS_NUM>]
                [--train-steps <TRAIN_STEPS>]
                [--test-episodes <TEST_EPISODES_NUM>] [--no-tqdm]

Learning script for ViZDoom.

positional arguments:
  agent                 agent function name from agents.py

optional arguments:
  -h, --help            show this help message and exit
  --load-agent <AGENT_FILE>, -l <AGENT_FILE>
                        load agent from a file
  --list                lists agents available in agents.py
  --load-json <JSON_FILE>, -j <JSON_FILE>
                        load agent's specification from a json file
  --config-file <CONFIG_FILE>, -c <CONFIG_FILE>
                        configuration file (used only when loading agent or
                        using json)
  --name <NAME>, -n <NAME>
                        agent's name (affects savefiles)
  --no-save             do not save agent's parameters
  --no-save-results     do not save agent's results
  --no-save-best        do not save the best agent
  --epochs <EPOCHS_NUM>, -e <EPOCHS_NUM>
                        number of epochs (default: infinity)
  --train-steps <TRAIN_STEPS>
                        training steps per epoch (default: 200k)
  --test-episodes <TEST_EPISODES_NUM>
                        testing episodes per epoch (default: 300)
  --no-tqdm             do not use tqdm progress bar

Usage of the script for watching:

usage: watch.py [-h] [--config-file [config_file]] [--episodes [episodes]]
                [--no-watch] [--action-sleep [action_sleep]]
                [--episode-sleep [episode_sleep]]

A script to watch agents play or test them.

positional arguments:
  agent_file            file with the agent

optional arguments:
  -h, --help            show this help message and exit
  --config-file [config_file], -c [config_file]
                        override agent's configuration file
  --episodes [episodes], -e [episodes]
                        run this many episodes (default 20)
  --no-watch            do not display the window and do not sleep
  --action-sleep [action_sleep], -s [action_sleep]
                        sleep this many seconds after each action
  --episode-sleep [episode_sleep]
                        sleep this many seconds after each episode

Usage of the plotting script:

usage: plot_results.py [-h] [--stats <STAT> [<STAT> ...] | --list |
                       --x-resolution <X_RESOLUTION>]
                       files [files ...]

This scprit plots results generated by learn.py.

positional arguments:
  files                 file(s) with results

optional arguments:
  -h, --help            show this help message and exit
  --stats <STAT> [<STAT> ...], -s <STAT> [<STAT> ...]
                        plot fiven stats e.g. mean, train_mean, std ...
  --list                lists available stats for all files and exit
  --x-resolution <X_RESOLUTION>, -r <X_RESOLUTION>
                        interval for x axis in number of training actions
                        (default: 1000000)