/creme

:loop: Online machine learning in Python

Primary LanguagePythonOtherNOASSERTION

creme_logo
travis codecov pypi bsd_3_license

creme is a library for online machine learning, also known as incremental learning. Online learning is a machine learning regime where a model learns one observation at a time. This is in contrast to batch learning where all the data is processed in one go. Incremental learning is desirable when the data is too big to fit in memory, or simply when you want to streaming data. In addition to many online machine learning algorithms, creme provides utilities for extracting features from a stream of data. The API is heavily inspired from that of scikit-learn, meaning that users who are familiar with it should feel comfortable.

Useful links

Installation

☝️ creme is intended to work with Python 3.6 and above.

creme can simply be installed with pip.

pip install creme

You can also install the latest development version as so:

pip install git+https://github.com/creme-ml/creme --upgrade

As for dependencies, creme mostly relies on Python's standard library. Sometimes it relies on numpy, scipy, and scikit-learn to avoid reinventing the wheel.

Quick example

In the following example we'll use a linear regression to forecast the number of available bikes in bike stations from the city of Toulouse 🚲.

We'll use the available numeric features, as well as calculate running averages of the target. Before being fed to the linear regression, the features will be scaled using a StandardScaler. Note that each of these steps works in a streaming fashion, including the feature extraction. We'll evaluate the model by asking it to forecast 30 minutes ahead while delaying the true answers, which ensures that we're simulating a production scenario. Finally we will print the current score every 20,000 predictions.

>>> import datetime as dt
>>> from creme import compose
>>> from creme import datasets
>>> from creme import feature_extraction
>>> from creme import linear_model
>>> from creme import metrics
>>> from creme import model_selection
>>> from creme import preprocessing
>>> from creme import stats

>>> X_y = datasets.fetch_bikes()

>>> def add_hour(x):
...     x['hour'] = x['moment'].hour
...     return x

>>> model = compose.Whitelister('clouds', 'humidity', 'pressure', 'temperature', 'wind')
>>> model += (
...     add_hour |
...     feature_extraction.TargetAgg(by=['station', 'hour'], how=stats.Mean())
... )
>>> model += feature_extraction.TargetAgg(by='station', how=stats.EWMean(0.5))
>>> model |= preprocessing.StandardScaler()
>>> model |= linear_model.LinearRegression()

>>> model_selection.online_qa_score(
...     X_y=X_y,
...     model=model,
...     metric=metrics.MAE(),
...     on='moment',
...     lag=dt.timedelta(minutes=30),
...     print_every=30_000
... )
[30,000] MAE: 2.193069
[60,000] MAE: 2.249345
[90,000] MAE: 2.288321
[120,000] MAE: 2.265257
[150,000] MAE: 2.2674
[180,000] MAE: 2.282485
MAE: 2.285921

We can also draw the model to understand how the data flows through.

>>> dot = model.draw()
bikes_pipeline

By only using a few lines of code, we've built a robust model and evaluated it by simulating a production scenario. You can find a more detailed version of this example here. creme is a framework that has a lot to offer, and as such we kindly refer you to the documentation if you want to know more.

Benchmarks

All the benchmarks, including reproducible code, are available here.

The following table summarizes the performance of regression methods from various libraries, using their default parameters.

Library Model MSE Fit time Average fit time Predict time Average predict time
creme LinearRegression 118.549437 2s, 580ms 10μs 759ms 3μs
creme PARegressor 143.477210 6s, 994ms 27μs 1s, 305ms 5μs
creme KNeighborsRegressor 155.585250 394ms 1μs 37s, 4ms 146μs
scikit-learn SGDRegressor 120.185848 36s, 433ms 144μs 14s, 766ms 58μs
scikit-learn PassiveAggressiveRegressor 143.477210 35s, 551ms 141μs 14s, 599ms 57μs
PyTorch (CPU) Linear 142.495995 47s, 335ms 187μs 14s, 822ms 58μs
Keras on Tensorflow (CPU) Dense 142.494512 1m, 18s, 296ms 310μs 49s, 225ms 195μs
scikit-garden MondrianTreeRegressor 201.687033 35s, 983ms 142μs 23s, 502ms 93μs
scikit-garden MondrianForestRegressor 142.364156 5m, 58s, 226ms 1ms, 420μs 2m, 40s, 728ms 637μs

Contributing

Like many subfields of machine learning, online learning is far from being an exact science and so there is still a lot to do. Feel free to contribute in any way you like, we're always open to new ideas and approaches. If you want to contribute to the code base please check out the CONTRIBUTING.md file. Also take a look at the issue tracker and see if anything takes your fancy.

Last but not least you are more than welcome to share with us on how you're using creme or online learning in general! We believe that online learning solves a lot of pain points in practice, and would love to share experiences.

This project follows the all-contributors specification. Contributions of any kind are welcome!

Max Halford
Max Halford

📆 💻
AdilZouitine
AdilZouitine

💻
Raphael Sourty
Raphael Sourty

💻
Geoffrey Bolmier
Geoffrey Bolmier

💻
vincent d warmerdam
vincent d warmerdam

💻
VaysseRobin
VaysseRobin

💻
Lygon Bowen-West
Lygon Bowen-West

💻
Florent Le Gac
Florent Le Gac

💻
Adrian Rosebrock
Adrian Rosebrock

📝

License

See the license file.