The Evaluation & Testing framework for Computer vision models

Control performance risks, bias and security issues in AI models

Install Moonwatcher 🌝

pip install moonwatcher

Try the demos

Warning

The demos require wget to be installed on your system.

In the demo the performance of a model on unusual values for brightness, contrast and saturation of the underlying dataset are checked. To see how to create your own specific test scenarios check out Quickstart.

Object detection (the demo will download the val2017 set of COCO and use a subset of it):

python -m moonwatcher.demo_detection

Classification (the demo will download STL-10 as a dataset):

python -m moonwatcher.demo_classification

🏃‍♀️ Quickstart
- 1. 🧑‍🏫 Slices, Checks and Checksuites
  - 🍰 Slices
  - ✅ Checks
  - 📄 Checksuites
- 2. 🤖 Run automated checks
- 3. 👨‍💻 Write custom checks and checksuites
🖥️ Web app

🏃‍♀️ Quickstart

1. 🧑‍🏫 Slices, Checks and Checksuites

There are three core concepts (apart from models and datasets) to this framework. These concepts are called Checks, Checksuites and Slices.

Slices

A slice is a subset of a dataset. There are different methods in the framework to create those subsets for sophisticated evaluation and testing setups.

Checks

A check is defining one specific evaluation and/or testing setups. It defines the metric used, the dataset or slice to evaluate/test on and optionally the test comparison. When a check is applied on a specific model it returns the evaluation calculated and optionally the testing result (True/False).

Checksuites

A checksuite combines multiple checks into one. It is a suite of checks as the name suggests.

2. 🤖 Run automated checks

Look into the relevant demo (demo_classification.py or demo_detection.py) to see how to create the MoonwatcherModel and MoonwatcherDataset from your data.

from moonwatcher.check import automated_checking
from moonwatcher.model.model import MoonwatcherModel
from moonwatcher.dataset.dataset import MoonwatcherDataset

# Your model (your_model) and dataset (your_dataset) loading somewhere

# Look into the relevant demo (demo_classification.py or demo_detection.py)
# to see how to create the MoonwatcherModel and MoonwatcherDataset from your data.
mw_model = MoonwatcherModel(
  model=your_model,
  ...
)
mw_dataset = MoonwatcherDataset(
  dataset=your_dataset,
  ...
)

automated_checking(model=mw_model, dataset=mw_dataset)

3. 👨‍💻 Write custom checks and checksuites

Writing a custom check works like this.

from moonwatcher.check import Check

accuracy_check = Check(
    name="AccuracyCheck",
    dataset_or_slice=mw_dataset,
    metric="Accuracy",
    operator=">",
    value=0.8,
)

# and run it on your model:
check_result = accuracy_check(mw_model)

Tip

You can also slice your dataset and use a slice for the check instead of the whole dataset.

Tip

Class/category based checking is not yet supported, but will be part of the next iteration.

Now adding another check and combining both into a checksuite

from moonwatcher.check import Check, CheckSuite

precision_check = Check(
    name="PrecisionCheck",
    dataset_or_slice=mw_dataset,
    metric="Precision",
    operator=">",
    value=0.8,
)

# Combine them into a checksuite
first_checksuite = CheckSuite(
    name="AllChecks", checks=[accuracy_check, precision_check]
)

# and run it on your model:
checksuite_result = first_checksuite(mw_model)

🖥️ Web app

The package can be used on its own, is open-source and will always be. We additionally developed a web app you can use to visualize results in a nice way. To try it out, check out

Web app instructions.

⭐️ Don’t forget to star the project if you want to support open source testing of ML models.

That's it. Have fun! 🌚

moonwatcher-ai/moonwatcher