๐ Version 2.6.0 out now! Read the release notes here..
skpro
is a library for supervised probabilistic prediction in python.
It provides scikit-learn
-like, scikit-base
compatible interfaces to:
- tabular supervised regressors for probabilistic prediction - interval, quantile and distribution predictions
- tabular probabilistic time-to-event and survival prediction - instance-individual survival distributions
- metrics to evaluate probabilistic predictions, e.g., pinball loss, empirical coverage, CRPS, survival losses
- reductions to turn
scikit-learn
regressors into probabilisticskpro
regressors, such as bootstrap or conformal - building pipelines and composite models, including tuning via probabilistic performance metrics
- symbolic probability distributions with value domain of
pandas.DataFrame
-s andpandas
-like interface
Overview | |
---|---|
Open Source | |
Tutorials | |
Community | |
CI/CD | |
Code | |
Downloads | |
Citation |
Documentation | |
---|---|
โญ Tutorials | New to skpro? Here's everything you need to know! |
๐ Binder Notebooks | Example notebooks to play with in your browser. |
๐ฉโ๐ป User Guides | How to use skpro and its features. |
โ๏ธ Extension Templates | How to build your own estimator using skpro's API. |
๐๏ธ API Reference | The detailed reference for skpro's API. |
๐ ๏ธ Changelog | Changes and version history. |
๐ณ Roadmap | skpro's software and community development plan. |
๐ Related Software | A list of related software. |
Questions and feedback are extremely welcome! We strongly believe in the value of sharing help publicly, as it allows a wider audience to benefit from it.
skpro
is maintained by the sktime
community, we use the same social channels.
Type | Platforms |
---|---|
๐ Bug Reports | GitHub Issue Tracker |
โจ Feature Requests & Ideas | GitHub Issue Tracker |
๐ฉโ๐ป Usage Questions | GitHub Discussions ยท Stack Overflow |
๐ฌ General Discussion | GitHub Discussions |
๐ญ Contribution & Development | dev-chat channel ยท Discord |
๐ Community collaboration session | Discord - Fridays 13 UTC, dev/meet-ups channel |
Our objective is to enhance the interoperability and usability of the AI model ecosystem:
-
skpro
is compatible with scikit-learn and sktime, e.g., ansktime
proba forecaster can be built with anskpro
proba regressor which in ansklearn
regressor with proba mode added byskpro
-
skpro
provides a mini-package management framework for first-party implementations, and for interfacing popular second- and third-party components, such as cyclic-boosting, MAPIE, or ngboost packages.
skpro
curates libraries of components of the following types:
Module | Status | Links |
---|---|---|
Probabilistic tabular regression | maturing | Tutorial ยท API Reference ยท Extension Template |
Time-to-event (survival) prediction | maturing | Tutorial ยท API Reference ยท Extension Template |
Performance metrics | maturing | API Reference |
Probability distributions | maturing | Tutorial ยท API Reference ยท Extension Template |
To install skpro
, use pip
:
pip install skpro
or, with maximum dependencies,
pip install skpro[all_extras]
Releases are available as source packages and binary wheels. You can see all available wheels here.
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from skpro.regression.residual import ResidualDouble
# step 1: data specification
X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_new, y_train, _ = train_test_split(X, y)
# step 2: specifying the regressor - any compatible regressor is valid!
# example - "squaring residuals" regressor
# random forest for mean prediction
# linear regression for variance prediction
reg_mean = RandomForestRegressor()
reg_resid = LinearRegression()
reg_proba = ResidualDouble(reg_mean, reg_resid)
# step 3: fitting the model to training data
reg_proba.fit(X_train, y_train)
# step 4: predicting labels on new data
# probabilistic prediction modes - pick any or multiple
# full distribution prediction
y_pred_proba = reg_proba.predict_proba(X_new)
# interval prediction
y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)
# quantile prediction
y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])
# variance prediction
y_pred_var = reg_proba.predict_var(X_new)
# mean prediction is same as "classical" sklearn predict, also available
y_pred_mean = reg_proba.predict(X_new)
# step 5: specifying evaluation metric
from skpro.metrics import CRPS
metric = CRPS() # continuous rank probability score - any skpro metric works!
# step 6: evaluat metric, compare predictions to actuals
metric(y_test, y_pred_proba)
>>> 32.19
There are many ways to get involved with development of skpro
, which is
developed by the sktime
community.
We follow the all-contributors
specification: all kinds of contributions are welcome - not just code.
Documentation | |
---|---|
๐ Contribute | How to contribute to skpro. |
๐ Mentoring | New to open source? Apply to our mentoring program! |
๐ Meetings | Join our discussions, tutorials, workshops, and sprints! |
๐ฉโ๐ง Developer Guides | How to further develop the skpro code base. |
๐ Contributors | A list of all contributors. |
๐ Roles | An overview of our core community roles. |
๐ธ Donate | Fund sktime and skpro maintenance and development. |
๐๏ธ Governance | How and by whom decisions are made in the sktime community. |
To cite skpro
in a scientific publication, see citations.