Evolutionary Planning in Latent Space (EPLS) - Paper Implementation

Paper written by Thor V.A.N. Olesen, Dennis Thinh Tan Nguyen, Rasmus Berg Palm and Sebastian Risi: https://arxiv.org/abs/2011.11293

This is the code repository used to demonstrate Evolutionary Planning in Latent Space on a Learned World Model. In this repository we have extended the World Models implementation to support an iterative training procedure with evolutionary planning, supported by a modular arcitecture to allow future extensions with new environments, test bases and planning algorithms.

The goal of this work is to use evolution to do online-planning in latent space on a learned model of the world and and is inspired by the Paper: Ha and Schmidhuber, "World Models", 2018.https://doi.org/10.5281/zenodo.1207631.

Similar to the World Models paper, we use a Convolutional Variational Auto Encoder (ConvVAE) to learn a representation of the world and a Mixture Density Recurrent Neural Network (MDRNN) to learn the dynamics of the world.

While the paper uses a simple linear policy network (controller) to produce actions, we use evolutionary planning algorithms such as Random Mutation Hill Climbing (RMHC) and Rolling Horizon Evolutionary Algorithm (RHEA) with the world model as a forward model to perform online planning. More information about RHEA can found in the paper: Raluca D. Gaina and Sam Devlin, Simon M. Lucas, and Diego Perez-Liebana, "Rolling Horizon Evolutionary Algorithms for General Video Game Playing", 2020. https://arxiv.org/pdf/2003.12331.pdf

Finally, we have focused our attention on planning in the single-player real-time game ’Car-Racing’ by Open AI https://gym.openai.com/envs/CarRacing-v0/ but the system supports other environments too.

Prerequisites

The system is written in Python 3.7 and utilizes PyTorch 1.5. Please refer to https://pytorch.org/ for PyTorch installation details.

The rest of the dependencies are as following:

- Gym environment dependencies
conda gym[all]
box2d

- Vizdoom
vizdoom - https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md

- Others
tqdm        : Progress bars
numpy       : matrix operations
matplotlib  : visualization
tensorboard : Logging
torchvision : Torch related
pytorch     : Torch related
dill        : File compression
colorama    : Enables multiline printing in CMD for Windows

Hyperparameters

The hyperparameters are stored in a json file and can be in config.json. The next few sections only showcases the essentials but you are welcome to play around with the other parameters.

Generate Data

To generate new rollouts set the following hyperparamers in the config file:

  1. is_generate_data: true
  2. data_generator: {rollouts: 10000 }

By default, a random policy is used but a good policy can be enabled with is_ha_agent_driver: true

Models

This sections explains how the VAE and MDRNN can be configured for training or reloading.

Training VAE

To train a new model VAE set the following hyperparamers in the config file:

  1. experiment_name: "SOME_NAME_YOU_DEFINE"
  2. is_train_vae: true
  3. latent_size: 64
  4.  vae_trainer:{
         "max_epochs": 20,
         "batch_size": 35,
         "learning_rate": 0.0001
     }
    
  5. Run

Reloading VAE

To reload a VAE, set the parameters accordingly:

  1. is_train_vae: false
  2. experiment_name: "SOME_NAME_YOU_DEFINE"

Training MDRNN

To train a new model VAE set the following hyperparamers in the config file:

  1. experiment_name: "SOME_NAME_YOU_DEFINE"

  2. is_train_mdrnn: true

  3. latent_size: 64

  4.  mdrnn_trainer:{
         max_epochs": 20,
         "learning_rate": 0.001,
         "sequence_length": 500,
         "train_test_files_ratio": 0.8,
         "is_random_sampling": true,
     }
    
  5. "mdrnn":{ "hidden_units": 512 }

  6. Run

NB: The current MDRNN trainer does not support batches with variable sequence lengths. This requires that you either extend the mdrnn trainer with padding/packing of sequences or use batch_size = 1

Reloading MDRNN

To reload a VAE, set the parameters accordingly:

  1. is_train_mdrnn: false
  2. experiment_name: "SOME_NAME_YOU_DEFINE"
  3. Ensure that the hidden_units and latent_size are set exactly to the values used to train the MDRNN.

Iterative Training

An iterative procedure has been implemented to refine existing non-iterative MDRNN Models. The idea is to iterativly collect rollouts generated by using the planning agent's policy and retrain the MDRNN with the collected rollouts to refine the model over time.

To to start iterative training set the following parameter:

  1. Copy an existing MDRNN to use as baseline and rename it ie World_Model_Random --> World_Model_Iter_A
  2. Set parameter experiment name to the filename experiment_name: World_Model_Iter_A"
  3. Set parameter is_iterative_train_mdrnn: True
  4. Run

You can likewise adjust different iterative training parameters here:

"iterative_trainer": {
    "iterative_data_dir": "data_iterative",
    "sequence_length": 100,
    "num_rollouts": 500,
    "num_iterations": 4,
    "max_epochs": 10,
    "test_scenario": "planning_whole_random_track",
    "fixed_cpu_cores": null,
    "max_test_threads": 3,
    "replay_buffer": {
        "is_replay_buffer": true,
        "max_buffer_size": 50000
    }
}

All rollouts generated are stored in iterative_data_dir. The baseline model will remain untouched and a new iterative model is created with "iterative_" as prefix ie: iterative_World_Model_Iter_A. For each iteration, the trained models are stored as backup in the mdrnn/checkpoints/backup in case you want to test previous models from earlier iterations or rerun planning tests on all iterative models with iteration_retester.py

Replay buffer: The replay buffer persists all rollouts throughout the iterative training, allowing it to randomly sample rollouts from previous iterations when training the MDRNN to avoid forgetting the past. The max capacity can be set with max_buffer_size and if the the replay buffer capacity is full, the oldest rollouts will gradually be replaced new rollouts.

Planning Algorithms

The planning algorithms available are:

  1. RHEA - Rolling Horizon Evolutionary Algorithm
  2. RMHC - Random Mutation Hill Climbing
  3. MCTS - Monte Carlo Tree Search

To choose an agent, set the parameter to either RHEA, RMHC or MCTS

  • "planning: { "planning_agent": RHEA }"

All other agent parameters can be played with as accordingly:

"planning": {
        "planning_agent": "RHEA",
           "rolling_horizon": {
            "population_size": 4,
            "horizon": 10,
            "max_generations": 15,
            "is_shift_buffer": false
        },
        "random_mutation_hill_climb": {
            "horizon": 50,
            "max_generations": 15,
            "is_shift_buffer": false,
        },
        "monte_carlo_tree_search": {
            "max_rollouts": 50,
            "rollout_length": 20,
            "temperature": 1.41,
            "is_discrete_delta": true
        }

RMHC and RHEA Evolutionary parameters

You can adjust the evolutionary parameters used by RMHC or RHEA (ie. mutation, crossover etc.) in the config file:

"evolution_handler": {
    "selection_method": "rank",
    "genetic_operator": "crossover_mutation",
    "crossover_method": "uniform",
    "mutation_method": "subset_mutation",
    "mutation_probability": 0.20,
    "tournament_percentage": 0.5,
    "random_seed": null
},

And below is the available parameter options.

    "mutation_options":["single_uniform", "all_uniform", "subset_mutation"],
    "RHEA_genetic_operator_options": ["crossover","mutation","crossover_mutation"],
    "RHEA_selection_options": ["uniform","tournament","rank","roulette"],
    "RHEA_crossover_methods_options": ["uniform","1_bit","2_bit"]

You can extend these options with new evolutionary settings in tuning/evolution_handly.py

NTBEA Parameter tuning

N-Tuple Bandit Evolutionary Algorithm (NTBEA) has been implemented to perform parameter tuning on the planning parameters. The implementation is based on https://github.com/bam4d/NTBEA and more information about NTBEA can be read in the paper Lucas, Liu, Perez-Liebana, "The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation", 2018.https://arxiv.org/abs/1802.05991

To run NTBEA tuning set the following configuration:

  1. is_ntbea_param_tune: True
  2. Select World Model: "experiment_name": "NAME OF WORLD MODEL"
  3. Select an agent: "planning: { "planning_agent": RHEA }"
  4. Run

Planning Benchmarks

To run planning benchmarks of the agents + world model set the following parameters:

  1. Select World Model: "experiment_name": "NAME OF WORLD MODEL"
  2. Select an agent: "planning: { "planning_agent": RHEA }"
  3.  "test_suite": {
         "is_run_planning_tests":  true,
         "is_reload_planning_session": false,
         "trials": 100,
         "is_multithread_tests": test,
         "is_multithread_trials": true,
         "fixed_cores": null,
         "is_logging": true,
     }
    

Replay benchmark session

To replay a benchmark session set the following parameters and run:

    "test_suite": {
        "is_run_model_tests": false,
        "is_run_planning_tests":  true,
        "is_reload_planning_session": true,
        "trials": 3,
        "planning_session_to_load": "name of file without '.pickle'"
    },

Show training, test and planning logs

The system uses tensorboard to log results train/test sessions of the VAE and MDRNN as well as results from the benchmarks within the planning tester. You can access the logs by running starting a tensorboard server using the following command in terminal:

tensorboard --logdir ../utility/logging/tensorboard_runs --samples_per_plugin text=500

Otherwise you can use the premade scripts in

    scripts/start_tensorboard.bat -- Windows
    scripts/start_tensorboard.sh  -- Linux/macOS

The logs are stored in utility/logging/tensorboard_runs

Live Play

To deploy the agent on a random game, set the following parameters:

  1. Select World Model: "experiment_name": "NAME OF WORLD MODEL"
  2. Select an agent: "planning: { "planning_agent": RHEA }"
  3. "is_play": true
  4. Run

Play in Dream

To enable playing in dream, use the above parameters but enable:

  • "is_dream_play": true

Manual Play

To manually control the car, use the above parameters but enable:

  • "is_manual_control": true

NB: You can likewise drive in the dream by enabling is_dream_play

Existing Models

The solution comes with a set of pretrained models that can be used. The best performing model is Model L.

  1. To select a world model go to config and set the experiment name attribute with a world model name in the world: "experiment_name": "SELECTED_WORLD_MODEL"

  2. In the config set latent_size and hidden_units accordingly to the parameters below eg:

    • Example:
      experiment_name: World_Model_A
      latent_size    : 64
      hidden_units   : 512 // MDRNN units
      
  3. Run the program

Models and their parameters


World Model Name | Parameters (epochs, sequence, latent_size, mdrnn_hidden_units, rollouts used for training)
World_Model_HaRandom           |  60_epoch 500Seq  64Latent 512Hidden  10k_ExpertRandom
World_Model_Random             |  60_epoch 500Seq  64Latent 256Hidden  10k_Random
iterative_World_Model_Iter_A   |  60_epoch 64Seq   64Latent 256Hidden  10k_Random - 10 iterations

Running the system on a headless server

If you run the system on a headless server you will need xvfb installed on it.

you can use the following xvfb command to run the system

xvfb-run -a -s "-screen 0 1280x1024x24" -- python main.py

Otherwise you may use the script

run_headless.sh

NB: To show logs in tensorboard remotely, you may use ngrok (https://ngrok.com/) to reroute the localhost address and make it accessible online. Otherwise you can also you a different address/port when serving the tensorboard server on the cluster

How to extend the system

The system has been built such that you can extend it with new environments, planning algorithms and such. The system uses factories to dynamically return the desired dependencies based on what environment you are using.

Extending with new environments

To the system with new environments:

  1. Create a new environment py file in environment/new_env_folder/new_env.py
  2. Create a class and let it extend BaseEnvironment from environment/base_environment.py
  3. Implement the missing functions reset(self, seed=None) and step(self, action, ignore_is_done=False)
  4. Register the new environment in environment/environment_factory.py

When you have created the new environment, you will need to implement an action samplers that are specific to the environment. The purpose is to fully decouple the environments from our system and allow one to customize how the actions are sampled and if they need any pre-processing.

  1. Follow the above instruction to create a new environment
  2. Create a new action sampler py file in environment/new_env_folder/new_env_action_sampler.py
  3. Create a class and let it extend BaseActionSampler from environment/actions/base_action_sampler.py
  4. Register the new action_sampler in environment/action_sampler_factory.py

Extending with new Data generators

If you have implemented a new environment, you will likewise want to extend the system with a rollout generator that can generate rollouts for you. The rollout generator is used to generate standard rollouts as well as during iterative training.

  1. Create a new rollout generator py file in utility/rollout_handling/new_env/new_env_rollout_generator.py
  2. Create a class and let it extend BaseRolloutGenerator from utility/rollout_handling/base_rollout_generator.py
  3. Implement the missing functions and how the rollouts are generated and collected - Usually you want to preprocess the frames to 64x64 here
  4. Register the new rollout generator in utility/rollout_handling/rollout_generator_factory.py

Extending with new test suites

If you have implemented a new environment, you will likewise want to extend the system with a test suite that allows you to test the agent's planning performance in the new environment. For this you can implement a Planning Tester or/and Model Tester. The planning tester contains different scenarios that test the agents planning capabilities and is also used during iterative training. The Model Tester contains different hardcoded actions to test how the model represents the dynamics of the environment. The hardcoded actions are used instead of the agent, since we want to test the model in isolation, such that the results are not affected by the agent's behavior.

Extending the planning tester

To extend the planning tester

  1. Create a new planning tester py file in tests_custom/new_env/new_env_planning_tester.py
  2. Create a class and let it extend BasePlanningTester from tests_custom/base_planning_tester.py
  3. Implement the missing functions and tests
  4. Register the new planning tester in tests_custom/test_suite_factory.py

Extending the model tester

To extend the model tester

  1. Create a new planning tester py file in tests_custom/new_env/new_env_model_tester.py
  2. Create a class and let it extend BaseTester from tests_custom/base_tester.py
  3. Implement the missing functions and tests
  4. Register the new model tester in tests_custom/test_suite_factory.py

Vizdoom integration issue

The Vizdoom environment is partially implemented (env, datagenerator, planning_tester). However Currently we are unable to train the MDRNN due to varying sequences. We need to extend the MDRNN and its trainer to support variable sequences within the batches. This requires modification to the dataloader utility/rollout_handling/mdrnn_loaders.py and the mdrnn trainer mdrnn/mdrnn_trainer.py. What we will suggest is to try training the MDRNN with single batches batch_size=1 since that would bypass the issue. However you may get exceptions since we got them the last time we tried to run it.

Copyright

Copyright (c) 2020, - All Rights Reserved - All files are part of the Evolutionary Planning in Latent Space paper. Unauthorized distribution of the project, via any medium is strictly prohibited without the consensus of the authors and ITU.

Authors: