ESTool

Evolved Biped Walker.

Implementation of various Evolution Strategies, such as GA, PEPG, CMA-ES and OpenAI's ES using common interface.

CMA-ES is wrapping around pycma.

Backround Reading:

A Visual Guide to Evolution Strategies

Evolving Stable Strategies

Using Evolution Strategies Library

To use es.py, please check out the simple_es_example.ipynb notebook.

The basic concept is:

solver = EvolutionStrategy()
while True:

  # ask the ES to give us a set of candidate solutions
  solutions = solver.ask()

  # create an array to hold the solutions.
  # solver.popsize = population size
  rewards = np.zeros(solver.popsize)

  # calculate the reward for each given solution
  # using your own evaluate() method
  for i in range(solver.popsize):
    rewards[i] = evaluate(solutions[i])

  # give rewards back to ES
  solver.tell(rewards)

  # get best parameter, reward from ES
  reward_vector = solver.result()

  if reward_vector[1] > MY_REQUIRED_REWARD:
    break

Parallel Processing Training with MPI

Please read Evolving Stable Strategies article for more demos and use cases.

To use the training tool (relies on MPI):

python train.py bullet_racecar -n 8 -t 4

will launch training jobs with 32 workers (using 8 MPI processes). the best model will be saved as a .json file in log/. This model should train in a few minutes on a 2014 MacBook Pro.

If you have more compute and have access to a 64-core CPU machine, I recommend:

python train.py name_of_environment -e 16 -n 64 -t 4

This will calculate fitness values based on an average of 16 random runs, on 256 workers (64 MPI processes x 4). In my experience this works reasonably well for most tasks inside config.py.

After training, to run pre-trained models:

python model.py bullet_ant log/name_of_your_json_file.json

bullet_ant pybullet environment. PEPG.

Another example: to run a minitaur duck model, run this locally:

python model.py bullet_minitaur_duck zoo/bullet_minitaur_duck.cma.256.json

Custom Minitaur Env.

In the .hist.json file, and on the screen output, we track the progress of training. The ordering of fields are:

generation count
time (seconds) taken so far
average fitness
worst fitness
best fitness
average standard deviation of params
average timesteps taken
max timesteps taken

Using plot_training_progress.ipynb in an IPython notebook, you can plot the traning logs for the .hist.json files. For example, in the bullet_ant task:

Bullet Ant training progress.

You need to install mpi4py, pybullet, gym etc to use various environments. Also roboschool/Box2D for some of the OpenAI gym envs.

On Windows, it is easiest to install mpi4py as follows:

Download and install mpi_x64.Msi from the HPC Pack 2012 MS-MPI Redistributable Package
Install a recent Visual Studio version with C++ compiler
Open a command prompt

git clone https://github.com/mpi4py/mpi4py
cd mpi4py
python setup.py install

Modify the train.py script and replace mpirun with mpiexec and -np with -n

MaveriQ/estool

ESTool

Backround Reading:

Using Evolution Strategies Library

Parallel Processing Training with MPI