Frouros is a Python library for drift detection in machine learning systems that provides a combination of classical and more recent algorithms for both concept and data drift detection.
"Everything changes and nothing stands still"
"You could not step twice into the same river"
Heraclitus of Ephesus (535-475 BCE.)
⚡️ Quickstart
Concept drift
As a quick example, we can use the wine dataset to which concept drift it is induced in order to show the use of a concept drift detector like DDM (Drift Detection Method).
import numpy as np
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from frouros.detectors.concept_drift import DDM, DDMConfig
np.random.seed(seed=31)
# Load wine dataset
X, y = load_wine(return_X_y=True)
# Split train (70%) and test (30%)
(
X_train,
X_test,
y_train,
y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)
# IMPORTANT: Induce/simulate concept drift in the last part (20%)
# of y_test by modifying some labels (50% approx). Therefore, changing P(y|X))
drift_size = int(y_test.shape[0] * 0.2)
y_test_drift = y_test[-drift_size:]
modify_idx = np.random.rand(*y_test_drift.shape) <= 0.5
y_test_drift[modify_idx] = (y_test_drift[modify_idx] + 1) % len(np.unique(y_test))
y_test[-drift_size:] = y_test_drift
# Define and fit model
pipeline = Pipeline(
[
("scaler", StandardScaler()),
("model", LogisticRegression()),
]
)
pipeline.fit(X=X_train, y=y_train)
# Detector configuration and instantiation
config = DDMConfig(warning_level=2.0,
drift_level=3.0,
min_num_instances=30,)
detector = DDM(config=config)
# Simulate data stream (assuming test label available after prediction)
for i, (X, y) in enumerate(zip(X_test, y_test)):
y_pred = pipeline.predict(X.reshape(1, -1))
error = 1 - int(y_pred == y)
detector.update(value=error)
status = detector.status
if status["drift"]:
print(f"Drift detected at index {i}")
break
>> Drift detected at index 44
More concept drift examples can be found here.
Data drift
As a quick example, we can use the iris dataset to which data drift in order to show the use of a data drift detector like Kolmogorov-Smirnov test.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from frouros.detectors.data_drift import KSTest
np.random.seed(seed=31)
# Load iris dataset
X, y = load_iris(return_X_y=True)
# Split train (70%) and test (30%)
(
X_train,
X_test,
y_train,
y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)
# Set the feature index to which detector is applied
dim_idx = 0
# IMPORTANT: Induce/simulate data drift in the selected feature of y_test by
# applying some gaussian noise. Therefore, changing P(X))
X_test[:, dim_idx] += np.random.normal(
loc=0.0,
scale=3.0,
size=X_test.shape[0],
)
# Define and fit model
model = DecisionTreeClassifier(random_state=31)
model.fit(X=X_train, y=y_train)
# Set significance level for hypothesis testing
alpha = 0.001
# Define and fit detector
detector = KSTest()
detector.fit(X=X_train[:, dim_idx])
# Apply detector to the selected feature of X_test
result = detector.compare(X=X_test[:, dim_idx])
# Check if drift is taking place
result[0].p_value < alpha
>> True # Data drift detected.
# Therefore, we can reject H0 (both samples come from the same distribution).
More data drift examples can be found here.
🛠 Installation
Frouros can be installed via pip:
pip install frouros
♂️ ️ Drift detection methods
🕵🏻The currently implemented detectors are listed in the following table.
Drift detector | Type | Family | Univariate (U) / Multivariate (M) | Numerical (N) / Categorical (C) | Method | Reference |
---|---|---|---|---|---|---|
Concept drift | Streaming | Change detection | U | N | BOCD | Adams and MacKay (2007) |
U | N | CUSUM | Page (1954) | |||
U | N | Geometric moving average | Roberts (1959) | |||
U | N | Page Hinkley | Page (1954) | |||
Statistical process control | U | N | DDM | Gama et al. (2004) | ||
U | N | ECDD-WT | Ross et al. (2012) | |||
U | N | EDDM | Baena-Garcıa et al. (2006) | |||
U | N | HDDM-A | Frias-Blanco et al. (2014) | |||
U | N | HDDM-W | Frias-Blanco et al. (2014) | |||
U | N | RDDM | Barros et al. (2017) | |||
Window based | U | N | ADWIN | Bifet and Gavalda (2007) | ||
U | N | KSWIN | Raab et al. (2020) | |||
U | N | STEPD | Nishida and Yamauchi (2007) | |||
Data drift | Batch | Distance based | U | N | Bhattacharyya distance | Bhattacharyya (1946) |
U | N | Earth Mover's distance | Rubner et al. (2000) | |||
U | N | Hellinger distance | Hellinger (1909) | |||
U | N | Histogram intersection normalized complement | Swain and Ballard (1991) | |||
U | N | Jensen-Shannon distance | Lin (1991) | |||
U | N | Kullback-Leibler divergence | Kullback and Leibler (1951) | |||
M | N | MMD | Gretton et al. (2012) | |||
U | N | PSI | Wu and Olson (2010) | |||
Statistical test | U | C | Chi-square test | Pearson (1900) | ||
U | N | Cramér-von Mises test | Cramér (1902) | |||
U | N | Kolmogorov-Smirnov test | Massey Jr (1951) | |||
U | N | Mann-Whitney U test | Mann and Whitney (1947) | |||
U | N | Welch's t-test | Welch (1947) | |||
Streaming | Distance based | M | N | MMD | Gretton et al. (2012) | |
Statistical test | U | N | Incremental Kolmogorov-Smirnov test | dos Reis et al. (2016) |
❗ What is and what is not Frouros?
Unlike other libraries that in addition to provide drift detection algorithms, include other functionalities such as anomaly/outlier detection, adversarial detection, imbalance learning, among others, Frouros has and will ONLY have one purpose: drift detection.
We firmly believe that machine learning related libraries or frameworks should not follow Jack of all trades, master of none principle. Instead, they should be focused on a single task and do it well.
✅ Who is using Frouros?
Frouros is actively being used by the following projects to implement drift detection in machine learning pipelines:
If you want your project listed here, do not hesitate to send us a pull request.
👍 Contributing
Check out the contribution section.
💬 Citation
Although Frouros paper is still in preprint, if you want to cite it you can use the preprint version (to be replaced by the paper once is published).
@article{cespedes2022frouros,
title={Frouros: A Python library for drift detection in machine learning systems},
author={C{\'e}spedes-Sisniega, Jaime and L{\'o}pez-Garc{\'\i}a, {\'A}lvaro },
journal={arXiv preprint arXiv:2208.06868},
year={2022}
}
📝 License
Frouros is an open-source software licensed under the BSD-3-Clause license.
🙏 Acknowledgements
Frouros has received funding from the Agencia Estatal de Investigación, Unidad de Excelencia María de Maeztu, ref. MDM-2017-0765.