Bocas

bocas is an opinionated open source framework for organizing, orchestrating, and ultimately publishing research experiments.

Some design highlights of bocas include:

the ability to cache artifacts between experiment runs
the de-coupling of plot generation and training jobs
bocas augments the ml-collections library to allow you to describes an array of experiments in a single config
run all of the experiments with a single command
gather artifacts from the experiments
aggregate the results into plots, tables, and figures for use in your final report
easily combine results from multiple experiments
and more!

Quick-Start

Overview

Using bocas is easy!
To get started, you need to be familiar with a few concepts. This overview covers everything you need to know.

To quickly jump right into things, check out the Oxford 102 flowers classification example.

Tasks & Tactics

In the mental model of bocas there exists Tasks and Tactics. A Task is something like: "classify images from MNIST", or "cluster samples into N classes", or "perform generative learning in X style".

A Tactic refers to the combination of all the details used to produce a solution to a Task. For example, one such Tactic for solving MNIST classification might be to train a ResNet50V2 on data augmented with AugMix.
Typically, to get a publishable result your paper will require you to have numerous tactics to benchmark your novel tactic against.

Typically, a research work will have many Tasks: where the overall goal of the paper is to benchmark a new Tactic's ability at solving a variety of tasks.

bocas is structured around this idea: you will have at least one Task, and each Task may be solved by numerous tactics. As such, I recommend breaking your codebase down at the Task level, structuring your paper's artifact with splits made on the Task level. For example, a classification paper might have the structure:

- tasks/
      - mnist/
            - ...
      - imagenet/
            - ...

Code Structure

bocas provides an opinionated framework for generating

Keeping these concepts in mind, bocas recommends that you structure your code into three levels:

library/ holds anything unique to your report/paper/publication. This might include a new augmentation, a new keras.Layer, a new loss function, or a new metric.
tasks/ holds all of the tasks to benchmark your new technique on.
paper/ holds the Latex or Markdown code required to render your paper
paper/artifacts subdirectory of paper that holds all of the artifacts produced by the tasks. Typically when running a Task sweep you'll want to provide this directory to your scripts.

Your tasks should be structured as follows:

All code for a task should reside in tasks/{task}/, i.e. tasks/oxford_102. You should create a run.py script. This script must have a run() method that accepts an ml_collections.ConfigDict as its first positional argument. If you follow the example in the Oxford Flowers 102 example, your run.py file will support both independent run and mass-scale sweeps:

def run(config):
    name = f'{config.optimizer}'
    train_ds, test_ds = tfds.load(
        "oxford102", as_supervised=True, split=["train", "test"]
    )
    model = keras_cv.models.ResNet50V2(
      include_rescaling=True,
      include_top=True,
      classes=102
    )
    model.compile(loss="mse", optimizer=config.optimizer)
    history = model.fit(train_ds, epochs=10)

    return bocas.Result(
        name=name,
        artifacts=[
            bocas.artifacts.KerasHistory(history, name="fit_history"),
        ],
    )

Once you are happy with the results from a single run.py run, create a sweep.py config file. In sweep.py, specify a ml_collections.ConfigDict containing bocas.Sweep objects for any value you'd like to sweep oer.

config = ml_collections.ConfigDict()

config.static_value = 'any-string-or-int-or-float-or-python-object'
config.optimizer = bocas.Sweep(['sgd', 'adam'])

Anytime a value of type bocas.Sweep() is encountered, the product of all other defined bocas.Sweep() parameters is run with the addition of the new values in that sweep.

Be careful with this! It is easy to create a lot of experiments:

config = ml_collections.ConfigDict()
config.learning_rate = bocas.Sweep([x/100 for x in range(5, 21)])
config.optimizer = bocas.Sweep(['sgd', 'adam'])
config.model = bocas.Sweep(
  ['resnet50', 'resnet50v2', 'densenet101', 'efficientnet']
)

This configuration already contains 15 * 2 * 4 or 120 runs! That is probably way more than you'd like. Try to define a few experiments that are all encompassing. To accomplish this, run hyper parameter sweeps separately, and hardcode the values into the final runs that are used to produce the charts.

After all of your runs are complete, create some charts and plots. Save them to your designated directory in your paper/ directory so that they are rendered into your updated paper.

I recommend writing a script to produce desired plots based on the artifacts that can be run entirely separately from your experiments themselves. Any example of this can be found in the oxford_102 example:

# scripts/create_plots.py
results = bocas.Result.load_collection("artifacts/")

metrics_to_plot = {}

for experiment in results:
    metrics = experiment.get_artifact("fit_history").metrics

    metrics_to_plot[f"{experiment.name} Train"] = metrics["accuracy"]
    metrics_to_plot[f"{experiment.name} Validation"] = metrics["val_accuracy"]

luketils.visualization.line_plot(
    metrics_to_plot,
    path=f"{paper_dir}/results/combined-accuracy.png",
    title="Model Accuracy",
)

Check out the full code in oxford_102.

Conclusions & Further Reading

Thats all it takes to get running with bocas. Please check out the examples/ directory for more reading. It contains a few more patterns that might be useful in structuring your experiments.

Limitations

⚠️ right now bocas is under active development ⚠️

While the API is relatively straightforward and simple, bocas lacks support for multi-worker experiment runs. This means that you will need to run all of your experiments concurrently on a single machine. If you are running 10-20 fit() loops to convergence, this will likely be an extremely expensive process.

Personally, I'd rather just wait for my experiments to run then fiddle with a ton of infrastructure. That being said, I mainly run small scale research.

If someone wants to contribute distributed runs, feel free!

License

Apache v2 License

Contributing

Contributions are more than welcome to bocas.
Please see the GitHub issue tracker, and feel free to pick up any issue annotated with Contribution Welcome.

Additionally, bug reports are not only welcome but encouraged.
Help me improve bocas!
I made this project because I needed the tool. I'm sure many others do as well.

Thanks!

If you find this tool helpful, please toss a GitHub star on the repo and follow me on Twitter.

Thank you to all of our GitHub contributors:

LukeWood/bocas