bocas
is an opinionated open source framework for organizing,
orchestrating, and ultimately publishing research experiments.
Some design highlights of bocas
include:
- the ability to cache artifacts between experiment runs
- the de-coupling of plot generation and training jobs
bocas
augments theml-collections
library to allow you to describes an array of experiments in a single config- run all of the experiments with a single command
- gather artifacts from the experiments
- aggregate the results into plots, tables, and figures for use in your final report
- easily combine results from multiple experiments
- and more!
- Basic: Oxford 102 flowers classification example
- Intermediate: Object detection benchmarks with KerasCV
- Overview
Using bocas
is easy!
To get started, you need to be familiar with a few concepts.
This overview covers everything you need to know.
To quickly jump right into things, check out the Oxford 102 flowers classification example.
In the mental model of bocas
there exists Tasks and Tactics. A Task is
something like: "classify images from MNIST", or "cluster samples into N classes", or
"perform generative learning in X style".
A Tactic refers to the combination of all the details used to produce a
solution to a Task. For example, one such Tactic for solving MNIST classification might
be to train a ResNet50V2 on data augmented with AugMix.
Typically, to get a publishable result your paper will require you to have numerous
tactics to benchmark your novel tactic against.
Typically, a research work will have many Tasks: where the overall goal of the paper is to benchmark a new Tactic's ability at solving a variety of tasks.
bocas
is structured around this idea: you will have at least one Task, and
each Task may be solved by numerous tactics.
As such, I recommend breaking your codebase down at the Task
level, structuring your
paper's artifact with splits made on the Task
level. For example, a classification
paper might have the structure:
- tasks/
- mnist/
- ...
- imagenet/
- ...
bocas
provides an opinionated framework for generating
Keeping these concepts in mind, bocas
recommends that you structure your code
into three levels:
library/
holds anything unique to your report/paper/publication. This might include a new augmentation, a newkeras.Layer
, a new loss function, or a new metric.tasks/
holds all of the tasks to benchmark your new technique on.paper/
holds theLatex
orMarkdown
code required to render your paperpaper/artifacts
subdirectory ofpaper
that holds all of the artifacts produced by thetasks
. Typically when running a Task sweep you'll want to provide this directory to your scripts.
Your tasks should be structured as follows:
All code for a task should reside in tasks/{task}/
, i.e. tasks/oxford_102
.
You should create a run.py
script. This script must have a run()
method that
accepts an ml_collections.ConfigDict
as its first positional argument. If you follow
the example in the Oxford Flowers 102 example, your
run.py
file will support both independent run and mass-scale sweeps:
def run(config):
name = f'{config.optimizer}'
train_ds, test_ds = tfds.load(
"oxford102", as_supervised=True, split=["train", "test"]
)
model = keras_cv.models.ResNet50V2(
include_rescaling=True,
include_top=True,
classes=102
)
model.compile(loss="mse", optimizer=config.optimizer)
history = model.fit(train_ds, epochs=10)
return bocas.Result(
name=name,
artifacts=[
bocas.artifacts.KerasHistory(history, name="fit_history"),
],
)
Once you are happy with the results from a single run.py
run, create a sweep.py
config file. In sweep.py
, specify a ml_collections.ConfigDict
containing
bocas.Sweep
objects for any value you'd like to sweep oer.
config = ml_collections.ConfigDict()
config.static_value = 'any-string-or-int-or-float-or-python-object'
config.optimizer = bocas.Sweep(['sgd', 'adam'])
Anytime a value of type bocas.Sweep()
is encountered, the product of all
other defined bocas.Sweep()
parameters is run with the addition of the new
values in that sweep.
Be careful with this! It is easy to create a lot of experiments:
config = ml_collections.ConfigDict()
config.learning_rate = bocas.Sweep([x/100 for x in range(5, 21)])
config.optimizer = bocas.Sweep(['sgd', 'adam'])
config.model = bocas.Sweep(
['resnet50', 'resnet50v2', 'densenet101', 'efficientnet']
)
This configuration already contains 15 * 2 * 4
or 120
runs! That is probably
way more than you'd like. Try to define a few experiments that are all encompassing.
To accomplish this, run hyper parameter sweeps separately, and hardcode the values into
the final runs that are used to produce the charts.
After all of your runs are complete, create some charts and plots. Save them to your
designated directory in your paper/
directory so that they are rendered
into your updated paper.
I recommend writing a script to produce desired plots based on the artifacts that can
be run entirely separately from your experiments themselves. Any example of this can
be found in the oxford_102
example:
# scripts/create_plots.py
results = bocas.Result.load_collection("artifacts/")
metrics_to_plot = {}
for experiment in results:
metrics = experiment.get_artifact("fit_history").metrics
metrics_to_plot[f"{experiment.name} Train"] = metrics["accuracy"]
metrics_to_plot[f"{experiment.name} Validation"] = metrics["val_accuracy"]
luketils.visualization.line_plot(
metrics_to_plot,
path=f"{paper_dir}/results/combined-accuracy.png",
title="Model Accuracy",
)
Check out the full code in oxford_102.
Thats all it takes to get running with bocas
. Please check out the
examples/
directory for more reading. It contains a few more patterns
that might be useful in structuring your experiments.
bocas
is under active development
While the API is relatively straightforward and simple, bocas
lacks support for multi-worker experiment runs. This means that you will need to run
all of your experiments concurrently on a single machine. If you are running 10-20
fit()
loops to convergence, this will likely be an extremely expensive process.
Personally, I'd rather just wait for my experiments to run then fiddle with a ton of infrastructure. That being said, I mainly run small scale research.
If someone wants to contribute distributed runs, feel free!
Contributions are more than welcome to bocas
.
Please see the GitHub issue tracker, and feel free to pick up any issue annotated
with Contribution Welcome.
Additionally, bug reports are not only welcome but encouraged.
Help me improve bocas
!
I made this project because I needed the tool.
I'm sure many others do as well.
If you find this tool helpful, please toss a GitHub star on the repo and follow me on Twitter.
Thank you to all of our GitHub contributors: