Paper written by Thor V.A.N. Olesen, Dennis Thinh Tan Nguyen, Rasmus Berg Palm and Sebastian Risi: https://arxiv.org/abs/2011.11293
This is the code repository used to demonstrate Evolutionary Planning in Latent Space on a Learned World Model. In this repository we have extended the World Models implementation to support an iterative training procedure with evolutionary planning, supported by a modular arcitecture to allow future extensions with new environments, test bases and planning algorithms.
The goal of this work is to use evolution to do online-planning in latent space on a learned model of the world and and is inspired by the Paper: Ha and Schmidhuber, "World Models", 2018.https://doi.org/10.5281/zenodo.1207631.
Similar to the World Models paper, we use a Convolutional Variational Auto Encoder (ConvVAE) to learn a representation of the world and a Mixture Density Recurrent Neural Network (MDRNN) to learn the dynamics of the world.
While the paper uses a simple linear policy network (controller) to produce actions, we use evolutionary planning algorithms such as Random Mutation Hill Climbing (RMHC) and Rolling Horizon Evolutionary Algorithm (RHEA) with the world model as a forward model to perform online planning. More information about RHEA can found in the paper: Raluca D. Gaina and Sam Devlin, Simon M. Lucas, and Diego Perez-Liebana, "Rolling Horizon Evolutionary Algorithms for General Video Game Playing", 2020. https://arxiv.org/pdf/2003.12331.pdf
Finally, we have focused our attention on planning in the single-player real-time game ’Car-Racing’ by Open AI https://gym.openai.com/envs/CarRacing-v0/ but the system supports other environments too.
The system is written in Python 3.7 and utilizes PyTorch 1.5. Please refer to https://pytorch.org/ for PyTorch installation details.
The rest of the dependencies are as following:
- Gym environment dependencies
conda gym[all]
box2d
- Vizdoom
vizdoom - https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md
- Others
tqdm : Progress bars
numpy : matrix operations
matplotlib : visualization
tensorboard : Logging
torchvision : Torch related
pytorch : Torch related
dill : File compression
colorama : Enables multiline printing in CMD for Windows
The hyperparameters are stored in a json file and can be in config.json. The next few sections only showcases the essentials but you are welcome to play around with the other parameters.
To generate new rollouts set the following hyperparamers in the config file:
is_generate_data: true
data_generator: {rollouts: 10000 }
By default, a random policy is used but a good policy can be enabled with is_ha_agent_driver: true
This sections explains how the VAE and MDRNN can be configured for training or reloading.
To train a new model VAE set the following hyperparamers in the config file:
experiment_name: "SOME_NAME_YOU_DEFINE"
is_train_vae: true
latent_size: 64
-
vae_trainer:{ "max_epochs": 20, "batch_size": 35, "learning_rate": 0.0001 }
- Run
To reload a VAE, set the parameters accordingly:
is_train_vae: false
experiment_name: "SOME_NAME_YOU_DEFINE"
To train a new model VAE set the following hyperparamers in the config file:
-
experiment_name: "SOME_NAME_YOU_DEFINE"
-
is_train_mdrnn: true
-
latent_size: 64
-
mdrnn_trainer:{ max_epochs": 20, "learning_rate": 0.001, "sequence_length": 500, "train_test_files_ratio": 0.8, "is_random_sampling": true, }
-
"mdrnn":{ "hidden_units": 512 }
-
Run
NB: The current MDRNN trainer does not support batches with variable sequence lengths. This requires that you either extend the mdrnn trainer with padding/packing of sequences or use batch_size = 1
To reload a VAE, set the parameters accordingly:
is_train_mdrnn: false
experiment_name: "SOME_NAME_YOU_DEFINE"
- Ensure that the
hidden_units
andlatent_size
are set exactly to the values used to train the MDRNN.
An iterative procedure has been implemented to refine existing non-iterative MDRNN Models. The idea is to iterativly collect rollouts generated by using the planning agent's policy and retrain the MDRNN with the collected rollouts to refine the model over time.
To to start iterative training set the following parameter:
- Copy an existing MDRNN to use as baseline and rename it ie
World_Model_Random --> World_Model_Iter_A
- Set parameter experiment name to the filename
experiment_name: World_Model_Iter_A"
- Set parameter
is_iterative_train_mdrnn: True
- Run
You can likewise adjust different iterative training parameters here:
"iterative_trainer": {
"iterative_data_dir": "data_iterative",
"sequence_length": 100,
"num_rollouts": 500,
"num_iterations": 4,
"max_epochs": 10,
"test_scenario": "planning_whole_random_track",
"fixed_cpu_cores": null,
"max_test_threads": 3,
"replay_buffer": {
"is_replay_buffer": true,
"max_buffer_size": 50000
}
}
All rollouts generated are stored in iterative_data_dir. The baseline model will remain untouched and a new iterative
model is created with "iterative_" as prefix ie: iterative_World_Model_Iter_A
. For each iteration, the trained models
are stored as backup in the mdrnn/checkpoints/backup in case you want to test previous models from earlier iterations or
rerun planning tests on all iterative models with iteration_retester.py
Replay buffer: The replay buffer persists all rollouts throughout the iterative training, allowing it to randomly
sample rollouts from previous iterations when training the MDRNN to avoid forgetting the past. The max capacity can be
set with max_buffer_size
and if the the replay buffer capacity is full, the oldest rollouts will gradually be replaced
new rollouts.
The planning algorithms available are:
- RHEA - Rolling Horizon Evolutionary Algorithm
- RMHC - Random Mutation Hill Climbing
- MCTS - Monte Carlo Tree Search
To choose an agent, set the parameter to either RHEA, RMHC or MCTS
"planning: { "planning_agent": RHEA }"
All other agent parameters can be played with as accordingly:
"planning": {
"planning_agent": "RHEA",
"rolling_horizon": {
"population_size": 4,
"horizon": 10,
"max_generations": 15,
"is_shift_buffer": false
},
"random_mutation_hill_climb": {
"horizon": 50,
"max_generations": 15,
"is_shift_buffer": false,
},
"monte_carlo_tree_search": {
"max_rollouts": 50,
"rollout_length": 20,
"temperature": 1.41,
"is_discrete_delta": true
}
You can adjust the evolutionary parameters used by RMHC or RHEA (ie. mutation, crossover etc.) in the config file:
"evolution_handler": {
"selection_method": "rank",
"genetic_operator": "crossover_mutation",
"crossover_method": "uniform",
"mutation_method": "subset_mutation",
"mutation_probability": 0.20,
"tournament_percentage": 0.5,
"random_seed": null
},
And below is the available parameter options.
"mutation_options":["single_uniform", "all_uniform", "subset_mutation"],
"RHEA_genetic_operator_options": ["crossover","mutation","crossover_mutation"],
"RHEA_selection_options": ["uniform","tournament","rank","roulette"],
"RHEA_crossover_methods_options": ["uniform","1_bit","2_bit"]
You can extend these options with new evolutionary settings in tuning/evolution_handly.py
N-Tuple Bandit Evolutionary Algorithm (NTBEA) has been implemented to perform parameter tuning on the planning parameters. The implementation is based on https://github.com/bam4d/NTBEA and more information about NTBEA can be read in the paper Lucas, Liu, Perez-Liebana, "The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation", 2018.https://arxiv.org/abs/1802.05991
To run NTBEA tuning set the following configuration:
is_ntbea_param_tune: True
- Select World Model:
"experiment_name": "NAME OF WORLD MODEL"
- Select an agent:
"planning: { "planning_agent": RHEA }"
- Run
To run planning benchmarks of the agents + world model set the following parameters:
- Select World Model:
"experiment_name": "NAME OF WORLD MODEL"
- Select an agent:
"planning: { "planning_agent": RHEA }"
-
"test_suite": { "is_run_planning_tests": true, "is_reload_planning_session": false, "trials": 100, "is_multithread_tests": test, "is_multithread_trials": true, "fixed_cores": null, "is_logging": true, }
To replay a benchmark session set the following parameters and run:
"test_suite": {
"is_run_model_tests": false,
"is_run_planning_tests": true,
"is_reload_planning_session": true,
"trials": 3,
"planning_session_to_load": "name of file without '.pickle'"
},
The system uses tensorboard to log results train/test sessions of the VAE and MDRNN as well as results from the benchmarks within the planning tester. You can access the logs by running starting a tensorboard server using the following command in terminal:
tensorboard --logdir ../utility/logging/tensorboard_runs --samples_per_plugin text=500
Otherwise you can use the premade scripts in
scripts/start_tensorboard.bat -- Windows
scripts/start_tensorboard.sh -- Linux/macOS
The logs are stored in utility/logging/tensorboard_runs
To deploy the agent on a random game, set the following parameters:
- Select World Model:
"experiment_name": "NAME OF WORLD MODEL"
- Select an agent:
"planning: { "planning_agent": RHEA }"
"is_play": true
- Run
To enable playing in dream, use the above parameters but enable:
"is_dream_play": true
To manually control the car, use the above parameters but enable:
"is_manual_control": true
NB: You can likewise drive in the dream by enabling is_dream_play
The solution comes with a set of pretrained models that can be used. The best performing model is Model L.
-
To select a world model go to config and set the experiment name attribute with a world model name in the world:
"experiment_name": "SELECTED_WORLD_MODEL"
-
In the config set latent_size and hidden_units accordingly to the parameters below eg:
- Example:
experiment_name: World_Model_A latent_size : 64 hidden_units : 512 // MDRNN units
- Example:
-
Run the program
Models and their parameters
World Model Name | Parameters (epochs, sequence, latent_size, mdrnn_hidden_units, rollouts used for training)
World_Model_HaRandom | 60_epoch 500Seq 64Latent 512Hidden 10k_ExpertRandom
World_Model_Random | 60_epoch 500Seq 64Latent 256Hidden 10k_Random
iterative_World_Model_Iter_A | 60_epoch 64Seq 64Latent 256Hidden 10k_Random - 10 iterations
If you run the system on a headless server you will need xvfb installed on it.
you can use the following xvfb command to run the system
xvfb-run -a -s "-screen 0 1280x1024x24" -- python main.py
Otherwise you may use the script
run_headless.sh
NB: To show logs in tensorboard remotely, you may use ngrok (https://ngrok.com/) to reroute the localhost address and make it accessible online. Otherwise you can also you a different address/port when serving the tensorboard server on the cluster
The system has been built such that you can extend it with new environments, planning algorithms and such. The system uses factories to dynamically return the desired dependencies based on what environment you are using.
To the system with new environments:
- Create a new environment py file in environment/new_env_folder/new_env.py
- Create a class and let it extend
BaseEnvironment
from environment/base_environment.py - Implement the missing functions
reset(self, seed=None)
andstep(self, action, ignore_is_done=False)
- Register the new environment in environment/environment_factory.py
When you have created the new environment, you will need to implement an action samplers that are specific to the environment. The purpose is to fully decouple the environments from our system and allow one to customize how the actions are sampled and if they need any pre-processing.
- Follow the above instruction to create a new environment
- Create a new action sampler py file in environment/new_env_folder/new_env_action_sampler.py
- Create a class and let it extend
BaseActionSampler
from environment/actions/base_action_sampler.py - Register the new action_sampler in environment/action_sampler_factory.py
If you have implemented a new environment, you will likewise want to extend the system with a rollout generator that can generate rollouts for you. The rollout generator is used to generate standard rollouts as well as during iterative training.
- Create a new rollout generator py file in utility/rollout_handling/new_env/new_env_rollout_generator.py
- Create a class and let it extend
BaseRolloutGenerator
from utility/rollout_handling/base_rollout_generator.py - Implement the missing functions and how the rollouts are generated and collected - Usually you want to preprocess the frames to 64x64 here
- Register the new rollout generator in utility/rollout_handling/rollout_generator_factory.py
If you have implemented a new environment, you will likewise want to extend the system with a test suite that allows you to test the agent's planning performance in the new environment. For this you can implement a Planning Tester or/and Model Tester. The planning tester contains different scenarios that test the agents planning capabilities and is also used during iterative training. The Model Tester contains different hardcoded actions to test how the model represents the dynamics of the environment. The hardcoded actions are used instead of the agent, since we want to test the model in isolation, such that the results are not affected by the agent's behavior.
To extend the planning tester
- Create a new planning tester py file in tests_custom/new_env/new_env_planning_tester.py
- Create a class and let it extend
BasePlanningTester
from tests_custom/base_planning_tester.py - Implement the missing functions and tests
- Register the new planning tester in tests_custom/test_suite_factory.py
To extend the model tester
- Create a new planning tester py file in tests_custom/new_env/new_env_model_tester.py
- Create a class and let it extend
BaseTester
from tests_custom/base_tester.py - Implement the missing functions and tests
- Register the new model tester in tests_custom/test_suite_factory.py
The Vizdoom environment is partially implemented (env, datagenerator, planning_tester).
However Currently we are unable to train the MDRNN due to varying sequences. We need to extend the MDRNN and its trainer to support variable
sequences within the batches. This requires modification to the dataloader utility/rollout_handling/mdrnn_loaders.py
and the mdrnn trainer mdrnn/mdrnn_trainer.py. What we will suggest is to try training the MDRNN with single batches
batch_size=1
since that would bypass the issue. However you may get exceptions since we got them the last time we
tried to run it.
Copyright (c) 2020, - All Rights Reserved - All files are part of the Evolutionary Planning in Latent Space paper. Unauthorized distribution of the project, via any medium is strictly prohibited without the consensus of the authors and ITU.
Authors:
- Thor V.A.N. Olesen thorolesen@gmail.com
- Dennis T.T. Nguyen dennisnguyen3000@yahoo.dk.