/skpro

A unified framework for tabular probabilistic regression and probability distributions in python

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

๐Ÿš€ Version 2.6.0 out now! Read the release notes here..

skpro is a library for supervised probabilistic prediction in python. It provides scikit-learn-like, scikit-base compatible interfaces to:

  • tabular supervised regressors for probabilistic prediction - interval, quantile and distribution predictions
  • tabular probabilistic time-to-event and survival prediction - instance-individual survival distributions
  • metrics to evaluate probabilistic predictions, e.g., pinball loss, empirical coverage, CRPS, survival losses
  • reductions to turn scikit-learn regressors into probabilistic skpro regressors, such as bootstrap or conformal
  • building pipelines and composite models, including tuning via probabilistic performance metrics
  • symbolic probability distributions with value domain of pandas.DataFrame-s and pandas-like interface
Overview
Open Source BSD 3-clause
Tutorials Binder !youtube
Community !discord !slack
CI/CD github-actions !codecov readthedocs platform
Code !pypi !conda !python-versions !black
Downloads PyPI - Downloads PyPI - Downloads Downloads
Citation DOI

๐Ÿ“š Documentation

Documentation
โญ Tutorials New to skpro? Here's everything you need to know!
๐Ÿ“‹ Binder Notebooks Example notebooks to play with in your browser.
๐Ÿ‘ฉโ€๐Ÿ’ป User Guides How to use skpro and its features.
โœ‚๏ธ Extension Templates How to build your own estimator using skpro's API.
๐ŸŽ›๏ธ API Reference The detailed reference for skpro's API.
๐Ÿ› ๏ธ Changelog Changes and version history.
๐ŸŒณ Roadmap skpro's software and community development plan.
๐Ÿ“ Related Software A list of related software.

๐Ÿ’ฌ Where to ask questions

Questions and feedback are extremely welcome! We strongly believe in the value of sharing help publicly, as it allows a wider audience to benefit from it.

skpro is maintained by the sktime community, we use the same social channels.

Type Platforms
๐Ÿ› Bug Reports GitHub Issue Tracker
โœจ Feature Requests & Ideas GitHub Issue Tracker
๐Ÿ‘ฉโ€๐Ÿ’ป Usage Questions GitHub Discussions ยท Stack Overflow
๐Ÿ’ฌ General Discussion GitHub Discussions
๐Ÿญ Contribution & Development dev-chat channel ยท Discord
๐ŸŒ Community collaboration session Discord - Fridays 13 UTC, dev/meet-ups channel

๐Ÿ’ซ Features

Our objective is to enhance the interoperability and usability of the AI model ecosystem:

  • skpro is compatible with scikit-learn and sktime, e.g., an sktime proba forecaster can be built with an skpro proba regressor which in an sklearn regressor with proba mode added by skpro

  • skpro provides a mini-package management framework for first-party implementations, and for interfacing popular second- and third-party components, such as cyclic-boosting, MAPIE, or ngboost packages.

skpro curates libraries of components of the following types:

Module Status Links
Probabilistic tabular regression maturing Tutorial ยท API Reference ยท Extension Template
Time-to-event (survival) prediction maturing Tutorial ยท API Reference ยท Extension Template
Performance metrics maturing API Reference
Probability distributions maturing Tutorial ยท API Reference ยท Extension Template

โณ Installing skpro

To install skpro, use pip:

pip install skpro

or, with maximum dependencies,

pip install skpro[all_extras]

Releases are available as source packages and binary wheels. You can see all available wheels here.

โšก Quickstart

Making probabilistic predictions

from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from skpro.regression.residual import ResidualDouble

# step 1: data specification
X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_new, y_train, _ = train_test_split(X, y)

# step 2: specifying the regressor - any compatible regressor is valid!
# example - "squaring residuals" regressor
# random forest for mean prediction
# linear regression for variance prediction
reg_mean = RandomForestRegressor()
reg_resid = LinearRegression()
reg_proba = ResidualDouble(reg_mean, reg_resid)

# step 3: fitting the model to training data
reg_proba.fit(X_train, y_train)

# step 4: predicting labels on new data

# probabilistic prediction modes - pick any or multiple

# full distribution prediction
y_pred_proba = reg_proba.predict_proba(X_new)

# interval prediction
y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)

# quantile prediction
y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])

# variance prediction
y_pred_var = reg_proba.predict_var(X_new)

# mean prediction is same as "classical" sklearn predict, also available
y_pred_mean = reg_proba.predict(X_new)

Evaluating predictions

# step 5: specifying evaluation metric
from skpro.metrics import CRPS

metric = CRPS()  # continuous rank probability score - any skpro metric works!

# step 6: evaluat metric, compare predictions to actuals
metric(y_test, y_pred_proba)
>>> 32.19

๐Ÿ‘‹ How to get involved

There are many ways to get involved with development of skpro, which is developed by the sktime community. We follow the all-contributors specification: all kinds of contributions are welcome - not just code.

Documentation
๐Ÿ’ Contribute How to contribute to skpro.
๐ŸŽ’ Mentoring New to open source? Apply to our mentoring program!
๐Ÿ“… Meetings Join our discussions, tutorials, workshops, and sprints!
๐Ÿ‘ฉโ€๐Ÿ”ง Developer Guides How to further develop the skpro code base.
๐Ÿ… Contributors A list of all contributors.
๐Ÿ™‹ Roles An overview of our core community roles.
๐Ÿ’ธ Donate Fund sktime and skpro maintenance and development.
๐Ÿ›๏ธ Governance How and by whom decisions are made in the sktime community.

๐Ÿ‘‹ Citation

To cite skpro in a scientific publication, see citations.