/darwin

Evolutionary Algorithms Framework

Primary LanguageC++Apache License 2.0Apache-2.0

Darwin Neuroevolution Framework

Darwin is a framework intended to make Neuroevolution experiments easy, quick and fun. It provides building blocks, samples and tooling to avoid the repetitive (and potentially complex) scaffolding required to research new ideas.

The current implementation is a combination of portable C++ (running on Linux, Windows and macOS), augmented by a collection of Python scripts for post-processing recorded evolution traces.

Evolutionary Algorithms and Neuroevolution

Evolutionary Algorithms are a class of nature-inspired algorithms based on the idea that a few basic mechanisms, loosely inspired by biological evolution (selection, reproduction and mutation) can be the building blocks for efficient searches in complex problem spaces. In particular, we can use Evolutionary Algorithms to train artificial neural networks: Neuroevolution.

Starting with a random initial population, we seek to evolve towards better solutions in a iterative fashion: each iteration (generation) attempts to combine (crossover) the most promising traits (selection) from the previous one, occasionally making random tweaks (mutation):

Evolve!

At a high level, the generic structure of an evolutionary algorithm can be as simple as:

    initialize_population
    while(not satisfied):
        for_each individual:
            evaluate_fitness
        next_generation:
            select_parents
            use crossover & mutation to generate children

This conceptual simplicity makes Evolutionary Algorithms attractive and easy to implement, although creating interesting domain-specific fitness functions and supporting a structured experimentation approach requires a lot of scaffolding: persisting experiment variations and results, visualizations, reports, profiling, etc. This is where the Darwin Framework comes in.

Darwin Neuroevolution Framework Overview

At the highest level, the core concepts are the Domain and the Population. The former describes what we're trying to solve, while the latter encapsulates the solution model(s) together with the specific evolutionary algorithm(s) used to search for better solutions.

The Domain and Population interfaces intentionally decouple the two concepts: domains don't know anything about the details of a particular population implementation, and the only thing a population knows about a domain is the number of inputs and outputs.

Domains

A Domain implementation defines the problem space: the "shape" of a solution (the number of inputs & outputs) and how to assign a fitness value to a particular solution.

In our case, a solution instance is encoded as a Genotype and it's evaluated indirectly through its phenotypic expression (the corresponding Brain).

For example, Pong is a domain implementation which simulates the classic 2-player arcade game. It defines 6 inputs + 2 outputs, and it calculates the fitness of every genotype in the population based solely on the results of a tournament between the population individuals themselves (so the evolved solutions don't incorporate any a priori knowledge of what a good game play looks like)

Populations

A Population is simply a set of Genotypes, together with the ability to generate new generations (after a Domain implementation evaluates and assigns fitness values to all the individual genotypes in the population)

The Genotype is an encoding for a particular solution and the "recipe" to construct the corresponding Brain (the "phenotype") with the number of inputs and outputs specified by the domain selected in the active experiment.

Summary

Here's how all these pieces fit together:

Key Interfaces

Using these interfaces, the general structure of the evolution driver code is illustrated below (this evolution top loop, with a few additions, is provided by the Darwin Framework so we don't have to re-implement it for every new experiment)

population->createPrimordialGeneration(population_size);
while (domain->evaluatePopulation(population)) {
    population->rankGenotypes();
    population->createNextGeneration();
}

This is everything required to know in order to experiment with new problems (domains) or implement new evolutionary algorithms (the populations). Everything else in the Darwin Framework exists to provide support for these concepts: persistance for storing experiment results, UI, tracking and visualizing experiments, common building blocks, tools to analyze the results, etc.

For additional information see the full documentation.

Darwin Studio

Darwin Studio is a visual integrated environment used to create, run and visualize experiments:

Darwin Studio

Currently it's the main user-facing tool included in the Darwin Framework, although there are plans to add additional options (for example a command line driver and/or Python bindings). For post-processing experiment results there are Python scripts which can parse Darwin universe databases.

Running Experiments & The Universe Database

Every instance of an experiment is persisted in a Universe database, which is implemented as a single Sqlite file. The key data model concepts are:

  • Universe: the persistent storage for a set of experiments.
  • Experiment: loosely speaking, a Domain / Population pair.
  • Variation: a specific set of configuration values for an experiment.
  • Trace: the recording of a particular experiment variation run.

Normally, each Domain and Population implementation comes with a set of configuration properties which can be edited before starting an experiment. For each set of values there's a Variation associated with the Experiment. Every time an experiment variation is started, a new Trace is created to record the history/results of the experiment.

The database schema models the structural relationships, while the actual configuration values and results are stored as JSON strings (the fields highlighted in green):

Darwin Data Model

Getting Started

Related projects

It's worth mentioning a few similar projects. Many of them are focused on RL rather than EA (while some cover both and more) but they overlap in interesting ways with the Darwin Framework:

Darwin Framework was created with the following goals and design principles in mind:

  • First class support for Evolutionary Algorithms concepts (population, generation, genotype/phenotype), without specializing on a particular EA flavor. This allows simple interfaces for implementing new algorithms (and domains) while accommodating a wide variety of algorithms (Neuroevolution, GP, GEP, ...)
  • Capable of running interesting experiments on easily available hardware (no expensive GPU or data center required). There are plans to take advantage of both GPUs and distributed platforms in the future.
  • Complete package: creating & running experiments, visualizing the progress and the final results, interacting with the solutions, and more. Currently Darwin Studio is the central part of the framework and it aims to offer a familiar integrated environment UI (it may be worth noting that the Darwin Framework would score well against John R. Koza's wish list mentioned in the seminal Genetic Programming book)
  • Structured approach to experimentation: all the experiment runs are automatically recorded, including the experiment variation "lineage"
  • Cross-platform with minimal external dependencies

This is not an officially supported Google product.