/simple-es

Simple implementations of multi-agent evolutionary strategies using pytorch.

Primary LanguagePythonApache License 2.0Apache-2.0

simple-es

Simple implementations of multi-agent evolutionary strategies using minimal dependencies.

Simple-es is designed to help you quickly understand evolutionary learning through code, so we considered easy-to-understand code structure first, yet has strong features.

Latest Check Date: Aug.11.2021


This project has 4 main features:

  1. evolutionary strategies with gym environment
  2. recurrent neural newtork support
  3. Pettingzoo multi-agent environment support
  4. wandb sweep parameter search support

NOTE: If you want a NEAT algorithm that has the same design pattern with simple-es and performs more powerful distributed processing using mpi4py, visit pyNeat.

Algorithms

We Implemented three algorithms below:

  • simple_evolution: Use Gaussian noise for offspring generation and apply the average weight of the offssprings to the weight of the next parent(mu) model.
  • simple_genetic: Use Gaussian noise to generate offspring for N parent models, and adopt the N models with the highest performance among offsprings as the next parent model. No mutation process implemented.
  • openai_es: Evolutionary strategy proposed by openAI in 2017 to solve problems of reinforcement learning. Visit the link for more information.

Recurrent Neural Network with POMDP environments.

Recurrent ANN(GRU) is also implemented by default. The use of the gru module can be set in the config file. For environment, LunarLander and CartPole support POMDP setting.

network:
  gru: True
env:
  name: "CartPole-v1"
  pomdp: True

config file conf/lunarlander_openai.yaml is applied to run in a POMDP setting, and it learns very well. You can try by running the command below:

python run_es.py --cfg-path conf/lunarlander_openai.yaml

POMDP CartPole benchmarks

GRU agent with simple-evolution strategy(green) got perfect score (500) in POMDP CartPole environment, whereas ANN agent(yellow) scores nearly 60, failed to learn POMDP CartPole environment. GRU agent with simple-genetic strategy(purple) also shows poor performance.

Pettingzoo Multi-Agent Environment

Three pettingzoo envionments are currently implemented: simple_spread, waterworld, multiwalker. But you can easily add other pettingzoo enviornments by modifying envs/pettingzoo_wrapper.py. You can try simple_spread environment by running the command below:

python run_es.py --cfg-path conf/simplespread.yaml

Wandb Sweep hyperparameter search

Wandb Sweep is a hyperparameter search tool serviced by wandb. It automatically finds the best hyperparameters for selected environment and strategy. hyperparameter for LunarLander with POMDP setting(conf/lunarlander_openai.yaml) is a good example of finding the hyperparameters quickly through the sweep function.

There is an example config file in the path sweep_config/ you can try wandb sweep. It can be run as follows:

> wandb sweep sweep_config/lunarlander_openaies.yaml
# command above will automatically create a sweep project and then print the execution command.
# ex) Wandb: Run sweep agent with: wandb agent <sweep prject name>
> wandb agent <sweep prject name>

Visit here for more information about wandb sweep.

Installation

prerequisite

You need following library:

> sudo apt install swig # for box2d-py

We recommand you to install in virtual environment to avoid any dependency issues.

# recommend python==3.8.10
> git clone https://github.com/jinPrelude/simple-es.git
> cd simple-es
> pip install -r requirements.txt

Increase thread limit

To train offsprings > 100, You may increase the system's python thread limit. Since it's python's fundamental issue, you can increase by modifying /etc/security/limits.conf

> sudo vim /etc/security/limits.conf

and add the codes below:

*               soft    nofile          65535
*               hard    nofile          65535

save and quit the file by vim command: esc + : + wq + enter, and reboot your computer.

Train

# training LunarLander-v2
> python run_es.py --cfg-path conf/lunarlander.yaml 

# training BiPedalWalker-v3
> python run_es.py --cfg-path conf/bipedal.yaml --log

You need wandb account for logging. Wandb provides various useful logging features for free.

Test saved model

# training LunarLander-v2
> python test.py --cfg-path conf/lunarlander.yaml --ckpt-path <saved-model-dir> --save-gif