EDM-DRL: Toward Stable Reinforcement Learning through Evolutionary Algorithms

Abstract

Deep reinforcement learning (DRL) has experienced tremendous growth in the past few years. However, training stability of agents continues to be an open research question. Here, the authors present Ensembled Directed Mutation of Deep Reinforcement Learning (EDM-DRL) - a hybridization of evolutionary computing (EC), ensemble learning, and DRL methods as a means of mitigating training instability in DRL agents. We show that our method trains more consistently than DRL baselines alone. We also show that by employing our novel mutation and ensemble methods, performance of DRL agents can be improved during test time without sacrificing training stability. Further, though a similar number of time steps are used, we show that the EDM-DRL algorithm uses a mere 1% or less of the network parameter updates used in Advantage Actor Critic (A2C). Finally, we conduct an ablation study to identify components within the EDM-DRL algorithm responsible for highest contribution.

Results

The graphs which correspond to the results of this work are documented below.

As shown, though the A2C algorithm is able to solve the environment faster on average, the EDM-DRL method is able to solve the environment with a tighter standard deviation than the A2C baseline.

This demonstrates that the weight voting mechanism is able to out perform the 1-elite strategy. In constrast, the softmax ensemble approach does not surpass the 1-elite strategy in final performance.

Here we see that the ablations of various mutation and recombination methods do not have significant effect on the final performance or stability of agents during training. In particular, the mean recombination strategy has high overlap with the unmodified base agent.

Requirements

Python 3.7
Libraries
- numpy
- gym
- matplotlib
- scipy
- torch

Usage

Analysis Scripts

visualize_all.py [folder or file] [--f Enables scan of whole folder]
- Example: "visualize_all.py results/experiment_1_baseline --f"
- The primary visualization of the images shown above. Takes all the experiments in the folder and plots the STDs and means of the runs.
visualize_runs.py [file]
- Example: "visualize_runs.py results/experiment_1_baseline/baselineA2C.p"
- Plots each individual run in an experiment.
visualize_ablation [results/experiment_3_ablation]
- Creates the ablation graphs. Only works on the ablation folder.
compare_stat_sig.py [folder] [--f, Required]
- Example: "compare_stat_sig.py results/experiment_1_baseline --f"
- runs a permutation of the all the experiments in the folder, comparing each using the defined statistical method.

Tuning

hparam_search.py [config file]
- Example: "hparam_search.py configs/experiment_2_ensemble/earl_cart_strat_weightedvote.json"
- Runs random search on specified config. This requires significant configuration fo the hparam_search file so usage is not recommended.

Project Structure

/configs
- Contains all of the configuration and hyperparameter details. Organized by experiment.
/earl
- Primary program folder.
/graphics
- Result images generated by the visualization scripts.
/logger
- Custom logger which was hooked up to baselines to provide fair comparison.
/results
- The result logs of the project. Organized by experiment.
/utils
- Scripts for hyperparameter and statistics.
/visualization
- Scripts for visualizing and comparing results.

Program Structure

ea.py
- Handles parameter evolution. Uses the parameters of the network and the calculated gradients.
logger.py
- Performs logging operations.
model.py
- The primary torch model for execution and gradient calculation.
runner.py
- Upholds the experiment, run, and execution loops. Coordinates data passing between different components.
storage.py
- Holds all the experiences of the model until the end of all the episodes.

Linked-Liszt/EDM-DRL