/guildai

Open source experiment tracking and optimization for machine learning

Primary LanguagePythonApache License 2.0Apache-2.0

Guild AI

CircleCI Appveyor PyPI version

This is the source repository for Guild AI.

Guild AI is an open source toolkit that automates and optimizes machine learning experiments.

  • Run unmodified training scripts, capturing each run result as a unique experiment
  • Automate trials using grid search, random search, and Bayesian optimization
  • Compare and analyze runs to understand and improve models
  • Backup training related operations such as data preparation and test
  • Archive runs to S3 or other remote systems
  • Run operations remotely on cloud accelerators
  • Package and distribute models for easy reproducibility

For more on features, see Guild AI - Features.

Important links:

Requirements

  • Linux, macOS, Windows (Windows requires Docker)
  • Python 2.7 or 3 with pip and virtualenv or Conda
  • TensorFlow

Install Guild AI

To install Guild AI with pip, run:

pip install guildai

If you already have Guild AI installed, you can upgrade it to the latest release using:

pip install guildai --upgrade

To install pre-release versions (e.g. you want the latest features or fixes), include the --pre command line option:

pip install guildai --upgrade --pre

NOTE: You may need to run pip as a privileged user (e.g. run using sudo) or else specify the --user command line option if you file permission errors when running pip install.

Ensure that you have TensorFlow installed.

NOTE: As of Guild 0.6, all major computational and ML frameworks are supported including PyTorch, Keras, MLKit, scikit-learn, and XGBoost. TensorFlow is required for TensorBoard integration. Your models do not have to use TensorFlow to work with Guild AI. The TensorFlow requirement will be removed in future releases of Guild.

For detailed install instructions, see Install Guild AI.

Quick Start

In this Quick Start guide, we create a mock training script and run it to illustrate the following features:

  • Run, capture and compare experiments
  • Use grid search and random search to explore hyperparameter space
  • Use Bayesian optimization to achieve the best performance

Links to more advanced topics are provided at the end of this section.

Mock training script

In a new project directory, create a file named train.py:

import numpy as np

x = 0.1
noise = 0.1

loss = (np.sin(5 * x) * (1 - np.tanh(x ** 2)) + np.random.randn() * noise)

print("x: %f" % x)
print("noise: %f" % noise)
print("loss: %f" % loss)

NOTE: This is a mock (fake) training script — it doesn't train anything! However, it illustrates the basics of training, where an operation is used to minimize loss given a set of inputs. In this example, we calculate loss using a noisy function and a single input x, which represents our hyperparameter.

Run an experiment

Open a command console and change to the project directory.

Use Guild to run the mock training script:

guild run train.py

Guild prompts you with the default values as defined in train.py. Press Enter to run the script.

NOTE: Guild automatically detects the flags defined in train.py and uses the default values. Later we run train.py with different values for x and even use Guild to find values that minimize the function loss.

When Guild runs a script, it captures the run as a unique experiment. You can list runs by running:

guild runs

View information for this run using:

guild runs info

By default, Guild shows information for the latest run.

NOTE: For information on any Guild command, use: guild COMMAND --help. For a list of commands, use guild --help. Online help is available at https://guild.ai/docs/commands/.

Experiments are saved on disk within file system directories. List files associated with the latest run using:

guild ls

In the case of our mock training script, the files list is empty because the script doesn't generate any files. However, you can see the path where the run is located.

Guild stores information associated with each run in files located in the .guild subdirectory of each run. You can list all of the files associated with the latest run, including Guild files, by specifying the --all command line option:

guild ls --all

If you want to export an experiment, use the export command:

guild export /tmp/my-experiments

This is useful for creating archives of runs that can be imported by you and others using the import command. For a more advanced example of backing runs up to the cloud, see Get Started - Backup and Restore.

You can list runs located in an archive directory by specifying the --archive command line option:

guild runs --archive /tmp/my-experiments

Run a second experiment

In the command console, run:

guild run train.py x=0.2

Press Enter to confirm the operation.

Guild runs train.py a second time using a new value for x.

Compare the two runs using compare:

guild compare

Guild Compare is spreadsheet-like application that lets you view experiment results. Use the cursor keys to navigate to various columns. Sort a column in ascending order by press 1 and in descending order by pressing 2. Press ? to view a list of key bindings for Guild Compare.

When you're done comparing the runs, press q to return to the command prompt.

Run multiple trials using grid and random search

Guild supports running multiple trials using various methods.

First, use Guild to run three trials over a discrete search space for x (i.e. a grid search, or parameter sweep) run:

guild run train.py x=[-2.0,0.0,2.0]

Press Enter to confirm. Guild generates three trials, one for each specified value of x.

Next, run three trials using random search over a uniform distribution of x:

guild run train.py x=uniform[-1.0:1.0] --max-trials 3

Press Enter to run the trials. Guild generates another three trials, according to the command line option --max-trials.

Compare the runs again:

guild compare --table

This time we use the --table command line option for compare. This tells Guild to print the results as a table rather than run interactively.

Sort the results by loss in ascending order by running:

guild compare --table --min loss

Limit the results to the top three results (i.e. the three results with the lowest loss) by using the --top option:

guild compare --table --min loss --top 3

Optimize loss using Bayesian methods

As you can see from the previous sections, Guild specializes in running, capturing, and comparing experiments. In this section, we demonstrate how this facility can be further used to optimize hyperparameters.

Let's try to find values for x that minimize loss. Because our mock training script is noisy (to simulate machine learning processes that are inherently noisy) we cannot find a single value for x where loss is always lowest. Nonetheless, we can find a range of values that tend to produce better results.

In the command console, run:

guild run train.py x=uniform[-2.0:2.0] --optimizer bayesian --max-trials 20

Press Enter to continue. Guild runs 20 trials, trying values for x with the goal of minimizing loss. By default, Guild uses a Bayesian method with Gaussian processes to explore values for x that have a higher likelihood of producing lower values of loss. You can specify other optimizers including forest (decision trees) and ggbrt (gradient boosted trees).

NOTE: Guild's built-in Bayesian optimization support is implemented using the excellent skopt library. In future releases, Guild will support additional optimization libraries, including optimizers you define yourself.

For more information on using Bayesian optimizers in Guild, see Get Started - Bayesian Optimization .

Compare all of the runs:

guild compare

In this case we run compare in interactive mode. To quickly find runs with the lowest loss, use the cursor keys to navigate to the loss column and press 1 to sort in ascending order.

Values for x that are close to -0.3 should be listed toward the top.

The function defined in train.py (again, a mock function used for illustration purposes — it doesn't actually train anything) can be used to plot the relationship between x and loss:

We indeed see that loss is lowest where x is around -0.3.

Press q to exit Guild Compare.

Clean up

If you want to delete a run, use:

guild runs rm RUN_ID_OR_INDEX

You can get the RUN_ID using guild runs. You can also specify a run INDEX, which is included in the runs list.

To delete all runs, use:

guild runs rm

You can restore deleted runs using the restore command.

For a more complete coverage of managing runs with Guild, see Get Started - Manage Runs.

Learn more

Refer to the Guild AI website for more information on Guild.

For more step-by-step tutorials, see the other Get Started guides:

For a complete list of commands supported by Guild, see:

Get help

If you have questions or are facing problems, please contact us on Guild AI Slack or open an issue on GitHub.