What's in this repo?

See my blog post on staying sane while doing ML for a brief description of how I handle my machine learning experiments.

This repo contains code that I use for this exact purpose.

It is not intended to be package that you can install and use reliably. This is a personal project, but I'll be happy if you find at least part of it usefull.

Docs / Examples

There is currently no docs, but see the blog post for philosophy, and example notebook for some usage example.

Note

You'll probably be much better of by using sacred.
This code needs a major refactor -- it was cut out from a larger project that was also handling job scheduling for my AWS workers (see this repo). It therefore contains a lot of code that could be greatly simplified

Summary

I use bnb to track my ML experiments.

from sklearn import datasets, ensemble, metrics
from bnb import Experiment, get_current_context

ex = Experiment('iris', 'mytag', dirty_ok=True, auto_enabled=True)

@ex.watch
def main(n_estimators, max_depth, criterion):
    ctx = get_current_context()
    clf = ensemble.RandomForestClassifier(n_estimators=n_estimators,
                                          max_depth=max_depth,
                                          criterion=criterion)

    iris = datasets.load_iris()
    X, y = iris.data, iris.target
    clf = clf.fit(X, y)
    p = clf.predict(X)

    precision = metrics.precision_score(y, p, average='macro')
    recall = metrics.recall_score(y, p, average='macro')
    
    ctx.report('precision', precision)
    ctx.report('recall', recall)

Running experiments like in the snippet below will create entries in a TinyDB database that can be later conveniently explored

Experiment:

for ne in [-1, 1, 2, 3, 4]:
    for md in [-1, 1, 2, 3, 4]:
        main(ne, md, criterion='gini')

Getting top 3 best runs, showing only the scores and config

>>> from bnb.vis.core import get
>>> res = get('iris')
>>> res.top_k(('precision', 'recall'), k=3).no_details().df
     results              config                       
   precision    recall criterion max_depth n_estimators
19  0.993464  0.993333      gini         4            3
24  0.980125  0.980000      gini         4            4
9   0.975309  0.973333      gini         4            1

>>> res.top_k(('precision', 'recall'), k=3).no_details().compare().df
     results                 config
   precision    recall n_estimators
19  0.993464  0.993333            3
24  0.980125  0.980000            4
9   0.975309  0.973333            1

elanmart/bnb

What's in this repo?

Docs / Examples

Note

Summary