The ML Paper Package (mlpaper)

Easy benchmarking of machine learning models with sklearn interface with statistical tests built-in.

Train, test, and evaluate models on multiple loss functions. Full result tables with error bars and significance tests are a one-liner for sklearn compatible objects. The design is documented in a workshop paper and poster.

Installation

Only Python>=3.5 is officially supported, but older versions of Python likely work as well.

The core package itself can be installed with:

pip install mlpaper

To also get the dependencies for the demos in the README install with

pip install mlpaper[demo]

See the GitHub, PyPI, and Read the Docs.

Executive summary

Classification uses mlpaper.classification
Regression uses mlpaper.regression
We use Bayes' decision rule to convert a predictive distribution to an action for each loss function
Objects just support methods fit and predict_log_proba (sklearn interface)

Modular pieces:

The "do-it-all" just_benchmark calls 3 modular routines
get_pred_log_prob: predictive distributions on each test point and model
loss_table: the losses for each prediction
loss_summary_table: mean loss for each method and error bars/p-values

Sciprint:

Publishable results: format a results dataframe for (LaTeX) publication
Cleanly formatted: correct significant figures, shifting of exponent for compactness, and correct alignment of decimal points, units in headers

Data splitter:

Supports random, ordinal, or temporal splitting across features in pandas dataframes
Jointly splitting across multiple features to test difficult generalization cases

Evaluation framework:

Two metric types: loss functions and curve summaries
Curve summaries: AUC for ROC, PR, and PRG
Built-in proper scoring rules: log loss, Brier loss, spherical loss
General loss matrices, and new metrics are easily added
Non-probabilistic methods usable by pipelining a calibrator

Error bars and significance tests:

Place confidence interval (CI) on mean loss of infinite test set from the same distribution
Three options for CI in loss_summary_table: t-test, bootstrap, and Bernstein bound
The p-values are designed to match the error bars (via the 3 methods)

Error bars on curves:

CI on raw curves (for plotting) and AUC (for tables) via bootstrap
Vectorized bootstrap: reweight data points via multinomial distribution
Avoids re-creating the data sets in memory (very slow)

Usage for classification problems

First, we consider the plot_classifier_comparison.py demo file. This extends the standard sklearn classifier comparison but also demos the ease of mlpaper to create a performance report.

The mlpaper package is meant to benchmark any model with any provided data set. However, in this demo, we use the example of the three toy data sets and ten classifiers from the sklearn example:

The mlpaper package can benchmark all of the of these methods and created a properly formatted LaTeX table (with error bars) in a few commands. This generates a results table for copy-and-paste into a ML paper .tex file in a few commands.

Pandas tables with the performance results of all the methods can be built by:

import mlpaper.classification as btc
from mlpaper.classification import STD_BINARY_CURVES, STD_CLASS_LOSS

performance_df, performance_curves_dict = btc.just_benchmark(
    X_train,
    y_train,
    X_test,
    y_test,
    2,
    classifiers,
    STD_CLASS_LOSS,
    STD_BINARY_CURVES,
    ref_method,
)

This benchmarks all the models in classifiers on the data (X_train, y_train, X_test, y_test) for 2-class classification. It uses the loss function described in the dictionaries STD_CLASS_LOSS, and the curves (e.g., ROC, PR) in STD_BINARY_CURVES. The ref_method defines the model that is the reference to compare against for assessing statistically significant performance gains.

The sciprint module formats these tables for scientific presentation. The performance dictionaries can be converted to cleanly formatted tables: correct significant figures, shifting of exponent for compactness, thresholding huge/small (crap limit) results, and correct alignment of decimal points, units in headers, etc. Here we use:

import mlpaper.sciprint as sp

print(
    sp.just_format_it(
        performance_df,
        shift_mod=3,
        unit_dict={"NLL": "nats"},
        crap_limit_min={"AUPRG": -1},
        EB_limit={"AUPRG": -1},
        non_finite_fmt={sp.NAN_STR: "N/A"},
        use_tex=False,
    )
)

to export the results in plain text, or for LaTeX we use:

import mlpaper.sciprint as sp

print(
    sp.just_format_it(
        performance_df,
        shift_mod=3,
        unit_dict={"NLL": "nats"},
        crap_limit_min={"AUPRG": -1},
        EB_limit={"AUPRG": -1},
        non_finite_fmt={sp.NAN_STR: "{--}"},
        use_tex=True,
    )
)

Output

Dataset 0 Raw Results (Moons)

Here we show the input to just_format_it (print(performance_df.to_string())):

metric                Brier                               NLL                            sphere                         zero_one                           AUC                       AP                    AUPRG
stat                   mean     error             p      mean     error             p      mean     error             p     mean     error         p      mean     error    p      mean     error    p      mean     error    p
method
AdaBoost           0.415492  0.138707  1.386332e-10  0.368357  0.079299  2.946082e-10  0.363273  0.147183  7.040699e-11    0.075  0.085310  0.000008  0.949875  0.095655  0.0  0.933245  0.154225  0.0  0.904640  0.227702  0.0
Decision Tree      0.177778  0.242857  5.124429e-08  0.403857  0.701531  4.071101e-01  0.158944  0.218431  3.489955e-09    0.050  0.070590  0.000012  0.966165  0.071165  0.0  0.947368  0.123839  0.0  0.938596  0.154283  0.0
Gaussian Process   0.265248  0.160014  3.628068e-11  0.273804  0.104741  9.779350e-10  0.216574  0.154083  2.912358e-12    0.025  0.050567  0.000001  0.952381  0.105834  0.0  0.897840  0.224560  0.0  0.920814  0.198315  0.0
Linear SVM         0.334650  0.248373  3.153531e-06  0.282571  0.170047  1.720037e-05  0.311622  0.239091  8.783367e-07    0.125  0.107116  0.000116  0.949875  0.075188  0.0  0.951728  0.095365  0.0  0.887049  0.222059  0.0
Naive Bayes        0.339865  0.248629  3.457673e-06  0.282526  0.178926  3.465523e-05  0.313773  0.233882  5.719445e-07    0.125  0.107116  0.000116  0.957393  0.072682  0.0  0.957084  0.098593  0.0  0.897823  0.186842  0.0
Nearest Neighbors  0.177778  0.205603  1.064302e-09  0.416345  0.696712  4.240499e-01  0.148434  0.175058  8.504074e-12    0.025  0.050567  0.000001  0.968672  0.073935  0.0  0.944444  0.111111  0.0  0.934985  0.162257  0.0
Neural Net         0.324146  0.222908  3.134170e-07  0.278736  0.145830  1.091201e-06  0.297476  0.216746  8.206739e-08    0.125  0.107116  0.000116  0.959900  0.072432  0.0  0.961052  0.080379  0.0  0.915010  0.204456  0.0
QDA                0.338089  0.262604  8.712525e-06  0.285470  0.206876  2.761767e-04  0.313055  0.243018  1.225787e-06    0.150  0.115652  0.000530  0.949875  0.077694  0.0  0.950718  0.098284  0.0  0.885171  0.192649  0.0
RBF SVM            0.146465  0.189716  5.131397e-11  0.173264  0.167918  2.510477e-07  0.120762  0.167803  9.753115e-13    0.025  0.050567  0.000001  0.957393  0.119010  0.0  0.925618  0.183161  0.0  0.920814  0.211212  0.0
Random Forest      0.305017  0.221354  1.639340e-07  0.264840  0.149891  9.905010e-07  0.273350  0.211773  2.624395e-08    0.075  0.085310  0.000008  0.966165  0.068922  0.0  0.975701  0.057849  0.0  0.956003  0.141548  0.0
iid                1.004444  0.021566           NaN  0.695370  0.010787           NaN  1.005362  0.026018           NaN    0.525  0.161742       NaN  0.500000  0.000000  NaN  0.525000  0.150000  NaN  0.000000  0.000000  NaN

Dataset 0 (Moons)

Here we show the output of just_format_it:

                          AP        p        AUC        p    AUPRG        p      Brier        p NLL (nats)        p     sphere        p   zero one        p
AdaBoost           0.93(16)   <0.0001  0.950(96)  <0.0001  0.90464  <0.0001  0.42(14)   <0.0001  0.368(80)  <0.0001  0.36(15)   <0.0001  0.075(86)  <0.0001
Decision Tree      0.95(13)   <0.0001  0.966(72)  <0.0001  0.93860  <0.0001  0.18(25)   <0.0001  0.40(71)    0.4072  0.16(22)   <0.0001  0.050(71)  <0.0001
Gaussian Process   0.90(23)   <0.0001  0.95(11)   <0.0001  0.92081  <0.0001  0.27(17)   <0.0001  0.27(11)   <0.0001  0.22(16)   <0.0001  0.025(51)  <0.0001
Linear SVM         0.952(96)  <0.0001  0.950(76)  <0.0001  0.88705  <0.0001  0.33(25)   <0.0001  0.28(18)   <0.0001  0.31(24)   <0.0001  0.13(11)    0.0002
Naive Bayes        0.957(99)  <0.0001  0.957(73)  <0.0001  0.89782  <0.0001  0.34(25)   <0.0001  0.28(18)   <0.0001  0.31(24)   <0.0001  0.13(11)    0.0002
Nearest Neighbors  0.94(12)   <0.0001  0.969(74)  <0.0001  0.93498  <0.0001  0.18(21)   <0.0001  0.42(70)    0.4241  0.15(18)   <0.0001  0.025(51)  <0.0001
Neural Net         0.961(81)  <0.0001  0.960(73)  <0.0001  0.91501  <0.0001  0.32(23)   <0.0001  0.28(15)   <0.0001  0.30(22)   <0.0001  0.13(11)    0.0002
QDA                0.951(99)  <0.0001  0.950(78)  <0.0001  0.88517  <0.0001  0.34(27)   <0.0001  0.29(21)    0.0003  0.31(25)   <0.0001  0.15(12)    0.0006
RBF SVM            0.93(19)   <0.0001  0.96(12)   <0.0001  0.92081  <0.0001  0.15(19)   <0.0001  0.17(17)   <0.0001  0.12(17)   <0.0001  0.025(51)  <0.0001
Random Forest      0.976(58)  <0.0001  0.966(69)  <0.0001  0.95600  <0.0001  0.31(23)   <0.0001  0.26(15)   <0.0001  0.27(22)   <0.0001  0.075(86)  <0.0001
iid                0.53(15)       N/A  0.5(0)         N/A  0(0)         N/A  1.004(22)      N/A  0.695(11)      N/A  1.005(27)      N/A  0.53(17)       N/A

Dataset 0 (Moons) in LaTeX

Here we show the output of just_format_it with use_tex=True:

\begin{tabular}{|l|Sr|Sr|Sr|Sr|Sr|Sr|Sr|}
\toprule
{} &                      {AP} &      {p} &      {AUC} &      {p} &  {AUPRG} &      {p} &    {Brier} &      {p} & {NLL (nats)} &      {p} &   {sphere} &      {p} & {zero one} &      {p} \\
\midrule
AdaBoost          &  0.93(16)  &  <0.0001 &  0.950(96) &  <0.0001 &  0.90464 &  <0.0001 &  0.42(14)  &  <0.0001 &    0.368(80) &  <0.0001 &  0.36(15)  &  <0.0001 &  0.075(86) &  <0.0001 \\
Decision Tree     &  0.95(13)  &  <0.0001 &  0.966(72) &  <0.0001 &  0.93860 &  <0.0001 &  0.18(25)  &  <0.0001 &    0.40(71)  &   0.4072 &  0.16(22)  &  <0.0001 &  0.050(71) &  <0.0001 \\
Gaussian Process  &  0.90(23)  &  <0.0001 &  0.95(11)  &  <0.0001 &  0.92081 &  <0.0001 &  0.27(17)  &  <0.0001 &    0.27(11)  &  <0.0001 &  0.22(16)  &  <0.0001 &  0.025(51) &  <0.0001 \\
Linear SVM        &  0.952(96) &  <0.0001 &  0.950(76) &  <0.0001 &  0.88705 &  <0.0001 &  0.33(25)  &  <0.0001 &    0.28(18)  &  <0.0001 &  0.31(24)  &  <0.0001 &  0.13(11)  &   0.0002 \\
Naive Bayes       &  0.957(99) &  <0.0001 &  0.957(73) &  <0.0001 &  0.89782 &  <0.0001 &  0.34(25)  &  <0.0001 &    0.28(18)  &  <0.0001 &  0.31(24)  &  <0.0001 &  0.13(11)  &   0.0002 \\
Nearest Neighbors &  0.94(12)  &  <0.0001 &  0.969(74) &  <0.0001 &  0.93498 &  <0.0001 &  0.18(21)  &  <0.0001 &    0.42(70)  &   0.4241 &  0.15(18)  &  <0.0001 &  0.025(51) &  <0.0001 \\
Neural Net        &  0.961(81) &  <0.0001 &  0.960(73) &  <0.0001 &  0.91501 &  <0.0001 &  0.32(23)  &  <0.0001 &    0.28(15)  &  <0.0001 &  0.30(22)  &  <0.0001 &  0.13(11)  &   0.0002 \\
QDA               &  0.951(99) &  <0.0001 &  0.950(78) &  <0.0001 &  0.88517 &  <0.0001 &  0.34(27)  &  <0.0001 &    0.29(21)  &   0.0003 &  0.31(25)  &  <0.0001 &  0.15(12)  &   0.0006 \\
RBF SVM           &  0.93(19)  &  <0.0001 &  0.96(12)  &  <0.0001 &  0.92081 &  <0.0001 &  0.15(19)  &  <0.0001 &    0.17(17)  &  <0.0001 &  0.12(17)  &  <0.0001 &  0.025(51) &  <0.0001 \\
Random Forest     &  0.976(58) &  <0.0001 &  0.966(69) &  <0.0001 &  0.95600 &  <0.0001 &  0.31(23)  &  <0.0001 &    0.26(15)  &  <0.0001 &  0.27(22)  &  <0.0001 &  0.075(86) &  <0.0001 \\
iid               &  0.53(15)  &     {--} &  0.5(0)    &     {--} &  0(0)    &     {--} &  1.004(22) &     {--} &    0.695(11) &     {--} &  1.005(27) &     {--} &  0.53(17)  &     {--} \\
\bottomrule
\end{tabular}

Dataset 1 Raw Results (Circles)

metric                Brier                               NLL                            sphere                         zero_one                               AUC                         AP                      AUPRG
stat                   mean     error             p      mean     error             p      mean     error             p     mean     error             p      mean     error      p      mean     error      p      mean     error      p
method
AdaBoost           0.772573  0.095313  2.033552e-07  0.576206  0.049498  1.935422e-07  0.734630  0.110164  2.279943e-07    0.175  0.123067  3.886877e-06  0.885417  0.117417  0.000  0.938284  0.095521  0.000  0.760908  0.492188  0.004
Decision Tree      0.799998  0.518223  3.008083e-01  2.763103  1.789881  2.691681e-02  0.682842  0.442331  7.918040e-02    0.200  0.129556  2.738574e-04  0.802083  0.143964  0.000  0.863636  0.163636  0.000  0.763158  0.266426  0.000
Gaussian Process   0.390730  0.221014  1.309465e-07  0.327736  0.134797  2.622545e-07  0.361218  0.224875  6.001903e-08    0.100  0.097167  2.365995e-07  0.963542  0.066106  0.000  0.977432  0.047043  0.000  0.930490  0.217950  0.000
Linear SVM         1.022831  0.032154  7.027710e-02  0.704573  0.016091  7.017962e-02  1.027522  0.038764  7.042062e-02    0.600  0.158673  1.000000e+00  0.513021  0.203687  0.942  0.531643  0.175163  0.194  0.197563  0.390902  0.344
Naive Bayes        0.644184  0.192038  3.242921e-07  0.478220  0.110889  2.871541e-07  0.630224  0.206960  4.057918e-07    0.300  0.148425  2.101106e-04  0.997396  0.013396  0.000  0.998264  0.008681  0.000  0.995747  0.030182  0.000
Nearest Neighbors  0.300000  0.152301  5.949906e-11  0.234446  0.100982  4.246213e-11  0.276718  0.158441  1.125534e-10    0.075  0.085310  5.310307e-07  0.966146  0.049479  0.000  0.996377  0.012940  0.000  0.990702  0.051036  0.000
Neural Net         0.699274  0.138407  2.892746e-09  0.532132  0.073755  3.119226e-09  0.664108  0.155756  3.187473e-09    0.275  0.144621  9.983420e-05  0.992188  0.025155  0.000  0.995192  0.019231  0.000  0.987240  0.055882  0.000
QDA                0.629840  0.182293  4.465387e-08  0.473008  0.104901  4.571531e-08  0.612127  0.196927  5.707883e-08    0.275  0.144621  9.983420e-05  0.997396  0.013021  0.000  0.998264  0.010029  0.000  0.995747  0.026592  0.000
RBF SVM            0.387512  0.207708  3.157955e-08  0.331539  0.128314  9.742683e-08  0.356649  0.210642  1.440976e-08    0.125  0.107116  6.271107e-07  0.966146  0.059580  0.000  0.979187  0.045865  0.000  0.936801  0.196317  0.000
Random Forest      0.657978  0.206179  3.062032e-05  0.479941  0.119849  2.282042e-05  0.650341  0.222052  3.599606e-05    0.350  0.154486  8.725736e-04  0.945312  0.081904  0.000  0.970699  0.055514  0.000  0.905713  0.269476  0.000
iid                1.071111  0.084626           NaN  0.728942  0.042566           NaN  1.084992  0.101256           NaN    0.600  0.158673           NaN  0.500000  0.000000    NaN  0.600000  0.175000    NaN  0.000000  0.000000    NaN

Dataset 1 (Circles)

                           AP        p        AUC        p      AUPRG        p      Brier        p NLL (nats)        p     sphere        p   zero one        p
AdaBoost           0.938(96)   <0.0001  0.89(12)   <0.0001  0.76091     0.0041  0.773(96)  <0.0001  0.576(50)  <0.0001  0.73(12)   <0.0001  0.17(13)   <0.0001
Decision Tree      0.86(17)    <0.0001  0.80(15)   <0.0001  0.76316    <0.0001  0.80(52)    0.3009  2.8(18)     0.0270  0.68(45)    0.0792  0.20(13)    0.0003
Gaussian Process   0.977(48)   <0.0001  0.964(67)  <0.0001  0.93049    <0.0001  0.39(23)   <0.0001  0.33(14)   <0.0001  0.36(23)   <0.0001  0.100(98)  <0.0001
Linear SVM         0.53(18)     0.1941  0.51(21)    0.9420  0.19756     0.3440  1.023(33)   0.0703  0.705(17)   0.0702  1.028(39)   0.0705  0.60(16)    1.0000
Naive Bayes        0.9983(87)  <0.0001  0.997(14)  <0.0001  0.996(31)  <0.0001  0.64(20)   <0.0001  0.48(12)   <0.0001  0.63(21)   <0.0001  0.30(15)    0.0003
Nearest Neighbors  0.996(13)   <0.0001  0.966(50)  <0.0001  0.991(52)  <0.0001  0.30(16)   <0.0001  0.23(11)   <0.0001  0.28(16)   <0.0001  0.075(86)  <0.0001
Neural Net         0.995(20)   <0.0001  0.992(26)  <0.0001  0.987(56)  <0.0001  0.70(14)   <0.0001  0.532(74)  <0.0001  0.66(16)   <0.0001  0.28(15)   <0.0001
QDA                0.998(11)   <0.0001  0.997(14)  <0.0001  0.996(27)  <0.0001  0.63(19)   <0.0001  0.47(11)   <0.0001  0.61(20)   <0.0001  0.28(15)   <0.0001
RBF SVM            0.979(46)   <0.0001  0.966(60)  <0.0001  0.93680    <0.0001  0.39(21)   <0.0001  0.33(13)   <0.0001  0.36(22)   <0.0001  0.13(11)   <0.0001
Random Forest      0.971(56)   <0.0001  0.945(82)  <0.0001  0.90571    <0.0001  0.66(21)   <0.0001  0.48(12)   <0.0001  0.65(23)   <0.0001  0.35(16)    0.0009
iid                0.60(18)        N/A  0.5(0)         N/A  0(0)           N/A  1.071(85)      N/A  0.729(43)      N/A  1.08(11)       N/A  0.60(16)       N/A

Dataset 1 (Circles) in LaTeX

\begin{tabular}{|l|Sr|Sr|Sr|Sr|Sr|Sr|Sr|}
\toprule
{} &                       {AP} &      {p} &      {AUC} &      {p} &    {AUPRG} &      {p} &    {Brier} &      {p} & {NLL (nats)} &      {p} &   {sphere} &      {p} & {zero one} &      {p} \\
\midrule
AdaBoost          &  0.938(96)  &  <0.0001 &  0.89(12)  &  <0.0001 &  0.76091   &   0.0041 &  0.773(96) &  <0.0001 &    0.576(50) &  <0.0001 &  0.73(12)  &  <0.0001 &  0.17(13)  &  <0.0001 \\
Decision Tree     &  0.86(17)   &  <0.0001 &  0.80(15)  &  <0.0001 &  0.76316   &  <0.0001 &  0.80(52)  &   0.3009 &    2.8(18)   &   0.0270 &  0.68(45)  &   0.0792 &  0.20(13)  &   0.0003 \\
Gaussian Process  &  0.977(48)  &  <0.0001 &  0.964(67) &  <0.0001 &  0.93049   &  <0.0001 &  0.39(23)  &  <0.0001 &    0.33(14)  &  <0.0001 &  0.36(23)  &  <0.0001 &  0.100(98) &  <0.0001 \\
Linear SVM        &  0.53(18)   &   0.1941 &  0.51(21)  &   0.9420 &  0.19756   &   0.3440 &  1.023(33) &   0.0703 &    0.705(17) &   0.0702 &  1.028(39) &   0.0705 &  0.60(16)  &   1.0000 \\
Naive Bayes       &  0.9983(87) &  <0.0001 &  0.997(14) &  <0.0001 &  0.996(31) &  <0.0001 &  0.64(20)  &  <0.0001 &    0.48(12)  &  <0.0001 &  0.63(21)  &  <0.0001 &  0.30(15)  &   0.0003 \\
Nearest Neighbors &  0.996(13)  &  <0.0001 &  0.966(50) &  <0.0001 &  0.991(52) &  <0.0001 &  0.30(16)  &  <0.0001 &    0.23(11)  &  <0.0001 &  0.28(16)  &  <0.0001 &  0.075(86) &  <0.0001 \\
Neural Net        &  0.995(20)  &  <0.0001 &  0.992(26) &  <0.0001 &  0.987(56) &  <0.0001 &  0.70(14)  &  <0.0001 &    0.532(74) &  <0.0001 &  0.66(16)  &  <0.0001 &  0.28(15)  &  <0.0001 \\
QDA               &  0.998(11)  &  <0.0001 &  0.997(14) &  <0.0001 &  0.996(27) &  <0.0001 &  0.63(19)  &  <0.0001 &    0.47(11)  &  <0.0001 &  0.61(20)  &  <0.0001 &  0.28(15)  &  <0.0001 \\
RBF SVM           &  0.979(46)  &  <0.0001 &  0.966(60) &  <0.0001 &  0.93680   &  <0.0001 &  0.39(21)  &  <0.0001 &    0.33(13)  &  <0.0001 &  0.36(22)  &  <0.0001 &  0.13(11)  &  <0.0001 \\
Random Forest     &  0.971(56)  &  <0.0001 &  0.945(82) &  <0.0001 &  0.90571   &  <0.0001 &  0.66(21)  &  <0.0001 &    0.48(12)  &  <0.0001 &  0.65(23)  &  <0.0001 &  0.35(16)  &   0.0009 \\
iid               &  0.60(18)   &     {--} &  0.5(0)    &     {--} &  0(0)      &     {--} &  1.071(85) &     {--} &    0.729(43) &     {--} &  1.08(11)  &     {--} &  0.60(16)  &     {--} \\
\bottomrule
\end{tabular}

Dataset 2 Raw Results (Linear)

metric                Brier                               NLL                            sphere                         zero_one                               AUC                       AP                    AUPRG
stat                   mean     error             p      mean     error             p      mean     error             p     mean     error             p      mean     error    p      mean     error    p      mean     error    p
method
AdaBoost           0.214533  0.216136  2.523354e-09  0.266751  0.284832  3.316058e-03  0.181731  0.192985  5.067723e-11    0.050  0.070590  2.365995e-07  0.960859  0.084919  0.0  0.984375  0.046444  0.0  0.962739  0.152133  0.0
Decision Tree      0.200000  0.282360  5.539287e-07  0.690777  0.975239  9.813826e-01  0.170711  0.241010  8.377727e-09    0.050  0.070590  2.365995e-07  0.954545  0.073593  0.0  1.000000  0.000000  0.0  1.000000  0.000000  0.0
Gaussian Process   0.248299  0.233660  5.571488e-08  0.231293  0.167469  1.166786e-06  0.226209  0.221771  1.002195e-08    0.075  0.085310  3.288484e-06  0.977273  0.048884  0.0  0.983970  0.036602  0.0  0.967939  0.113686  0.0
Linear SVM         0.195653  0.169766  1.953849e-12  0.171331  0.106189  8.714501e-13  0.182363  0.173447  2.092714e-12    0.075  0.085310  6.271107e-07  0.992424  0.025391  0.0  0.993883  0.020471  0.0  0.989313  0.046518  0.0
Naive Bayes        0.182688  0.199860  1.436482e-10  0.153294  0.146642  2.446338e-09  0.169801  0.189483  2.112408e-11    0.050  0.070590  2.365995e-07  0.989899  0.025705  0.0  0.992154  0.029191  0.0  0.985926  0.053426  0.0
Nearest Neighbors  0.288888  0.292454  8.819375e-06  0.758788  0.972439  9.062639e-01  0.253939  0.255113  3.272489e-07    0.075  0.085310  3.288484e-06  0.945707  0.079545  0.0  0.991736  0.030951  0.0  0.985062  0.062596  0.0
Neural Net         0.241892  0.180491  6.591102e-11  0.225558  0.116770  2.636651e-10  0.213904  0.178405  1.739092e-11    0.050  0.070590  2.365995e-07  0.979798  0.041179  0.0  0.985330  0.040191  0.0  0.971326  0.097755  0.0
QDA                0.212993  0.231863  1.247745e-08  0.229875  0.279135  1.326240e-03  0.194385  0.210940  6.717171e-10    0.075  0.085310  6.271107e-07  0.974747  0.062467  0.0  0.984199  0.046699  0.0  0.965601  0.119770  0.0
RBF SVM            0.214270  0.250165  6.537310e-08  0.217172  0.210803  2.886575e-05  0.185181  0.225345  2.477126e-09    0.050  0.070590  2.365995e-07  0.969697  0.060865  0.0  0.980435  0.051863  0.0  0.957777  0.153369  0.0
Random Forest      0.234000  0.239004  3.497739e-08  0.462160  0.698397  4.890795e-01  0.205669  0.216480  1.355248e-09    0.075  0.085310  6.271107e-07  0.972222  0.063131  0.0  0.993883  0.017963  0.0  0.989313  0.050657  0.0
iid                1.017778  0.042969           NaN  0.702051  0.021516           NaN  1.021406  0.051753           NaN    0.550  0.161133           NaN  0.500000  0.000000  NaN  0.550000  0.150000  NaN  0.000000  0.000000  NaN

Dataset 2 (Linear)

                          AP        p        AUC        p      AUPRG        p      Brier        p NLL (nats)        p     sphere        p   zero one        p
AdaBoost           0.984(47)  <0.0001  0.961(85)  <0.0001  0.96274    <0.0001  0.21(22)   <0.0001  0.27(29)    0.0034  0.18(20)   <0.0001  0.050(71)  <0.0001
Decision Tree      1(0)       <0.0001  0.955(74)  <0.0001  1(0)       <0.0001  0.20(29)   <0.0001  0.69(98)    0.9814  0.17(25)   <0.0001  0.050(71)  <0.0001
Gaussian Process   0.984(37)  <0.0001  0.977(49)  <0.0001  0.96794    <0.0001  0.25(24)   <0.0001  0.23(17)   <0.0001  0.23(23)   <0.0001  0.075(86)  <0.0001
Linear SVM         0.994(21)  <0.0001  0.992(26)  <0.0001  0.989(47)  <0.0001  0.20(17)   <0.0001  0.17(11)   <0.0001  0.18(18)   <0.0001  0.075(86)  <0.0001
Naive Bayes        0.992(30)  <0.0001  0.990(26)  <0.0001  0.986(54)  <0.0001  0.18(20)   <0.0001  0.15(15)   <0.0001  0.17(19)   <0.0001  0.050(71)  <0.0001
Nearest Neighbors  0.992(31)  <0.0001  0.946(80)  <0.0001  0.985(63)  <0.0001  0.29(30)   <0.0001  0.76(98)    0.9063  0.25(26)   <0.0001  0.075(86)  <0.0001
Neural Net         0.985(41)  <0.0001  0.980(42)  <0.0001  0.971(98)  <0.0001  0.24(19)   <0.0001  0.23(12)   <0.0001  0.21(18)   <0.0001  0.050(71)  <0.0001
QDA                0.984(47)  <0.0001  0.975(63)  <0.0001  0.96560    <0.0001  0.21(24)   <0.0001  0.23(28)    0.0014  0.19(22)   <0.0001  0.075(86)  <0.0001
RBF SVM            0.980(52)  <0.0001  0.970(61)  <0.0001  0.95778    <0.0001  0.21(26)   <0.0001  0.22(22)   <0.0001  0.19(23)   <0.0001  0.050(71)  <0.0001
Random Forest      0.994(18)  <0.0001  0.972(64)  <0.0001  0.989(51)  <0.0001  0.23(24)   <0.0001  0.46(70)    0.4891  0.21(22)   <0.0001  0.075(86)  <0.0001
iid                0.55(15)       N/A  0.5(0)         N/A  0(0)           N/A  1.018(43)      N/A  0.702(22)      N/A  1.021(52)      N/A  0.55(17)       N/A

Dataset 2 (Linear) in LaTeX

\begin{tabular}{|l|Sr|Sr|Sr|Sr|Sr|Sr|Sr|}
\toprule
{} &                      {AP} &      {p} &      {AUC} &      {p} &    {AUPRG} &      {p} &    {Brier} &      {p} & {NLL (nats)} &      {p} &   {sphere} &      {p} & {zero one} &      {p} \\
\midrule
AdaBoost          &  0.984(47) &  <0.0001 &  0.961(85) &  <0.0001 &  0.96274   &  <0.0001 &  0.21(22)  &  <0.0001 &    0.27(29)  &   0.0034 &  0.18(20)  &  <0.0001 &  0.050(71) &  <0.0001 \\
Decision Tree     &  1(0)      &  <0.0001 &  0.955(74) &  <0.0001 &  1(0)      &  <0.0001 &  0.20(29)  &  <0.0001 &    0.69(98)  &   0.9814 &  0.17(25)  &  <0.0001 &  0.050(71) &  <0.0001 \\
Gaussian Process  &  0.984(37) &  <0.0001 &  0.977(49) &  <0.0001 &  0.96794   &  <0.0001 &  0.25(24)  &  <0.0001 &    0.23(17)  &  <0.0001 &  0.23(23)  &  <0.0001 &  0.075(86) &  <0.0001 \\
Linear SVM        &  0.994(21) &  <0.0001 &  0.992(26) &  <0.0001 &  0.989(47) &  <0.0001 &  0.20(17)  &  <0.0001 &    0.17(11)  &  <0.0001 &  0.18(18)  &  <0.0001 &  0.075(86) &  <0.0001 \\
Naive Bayes       &  0.992(30) &  <0.0001 &  0.990(26) &  <0.0001 &  0.986(54) &  <0.0001 &  0.18(20)  &  <0.0001 &    0.15(15)  &  <0.0001 &  0.17(19)  &  <0.0001 &  0.050(71) &  <0.0001 \\
Nearest Neighbors &  0.992(31) &  <0.0001 &  0.946(80) &  <0.0001 &  0.985(63) &  <0.0001 &  0.29(30)  &  <0.0001 &    0.76(98)  &   0.9063 &  0.25(26)  &  <0.0001 &  0.075(86) &  <0.0001 \\
Neural Net        &  0.985(41) &  <0.0001 &  0.980(42) &  <0.0001 &  0.971(98) &  <0.0001 &  0.24(19)  &  <0.0001 &    0.23(12)  &  <0.0001 &  0.21(18)  &  <0.0001 &  0.050(71) &  <0.0001 \\
QDA               &  0.984(47) &  <0.0001 &  0.975(63) &  <0.0001 &  0.96560   &  <0.0001 &  0.21(24)  &  <0.0001 &    0.23(28)  &   0.0014 &  0.19(22)  &  <0.0001 &  0.075(86) &  <0.0001 \\
RBF SVM           &  0.980(52) &  <0.0001 &  0.970(61) &  <0.0001 &  0.95778   &  <0.0001 &  0.21(26)  &  <0.0001 &    0.22(22)  &  <0.0001 &  0.19(23)  &  <0.0001 &  0.050(71) &  <0.0001 \\
Random Forest     &  0.994(18) &  <0.0001 &  0.972(64) &  <0.0001 &  0.989(51) &  <0.0001 &  0.23(24)  &  <0.0001 &    0.46(70)  &   0.4891 &  0.21(22)  &  <0.0001 &  0.075(86) &  <0.0001 \\
iid               &  0.55(15)  &     {--} &  0.5(0)    &     {--} &  0(0)      &     {--} &  1.018(43) &     {--} &    0.702(22) &     {--} &  1.021(52) &     {--} &  0.55(17)  &     {--} \\
\bottomrule
\end{tabular}

ROC curves

The just_benchmark routines also produces ROC curves with error bars from bootstrap analysis, which have been vectorized for speed:

Precision-recall curves

Precision-recall-gain curves

Usage for regression problems

The mlpaper package can also be applied to a regression problem with:

import mlpaper.regression as btr

full_tbl = btr.just_benchmark(X_train, y_train, X_test, y_test, regressors, STD_REGR_LOSS, "iid", pairwise_CI=True)

Here we have used pairwise_CI=True which makes the confidence intervals based on the uncertainty of the loss difference to the reference method rather than a confidence interval on the actual loss.

Output

By extending the sklearn regression demo we can make simple formatted tables:

             MAE       p          MSE        p   NLL (nats)        p
BLR  0.96933(30)  0.0979  1.39881(67)   0.0665  1.58842(57)   0.9828
GPR  0.75(13)     0.0009  0.75(28)     <0.0001  1.27(12)     <0.0001
iid  0.96908         N/A  1.3982           N/A  1.5884           N/A

or in LaTeX:

\begin{tabular}{|l|Sr|Sr|Sr|}
\toprule
{}  &        {MAE} &     {p} &        {MSE} &      {p} & {NLL (nats)} &      {p} \\
\midrule
BLR &  0.96933(30) &  0.0979 &  1.39881(67) &   0.0665 &  1.58842(57) &   0.9828 \\
GPR &  0.75(13)    &  0.0009 &  0.75(28)    &  <0.0001 &  1.27(12)    &  <0.0001 \\
iid &  0.96908     &     N/A &  1.3982      &      N/A &  1.5884      &      N/A \\
\bottomrule
\end{tabular}

Contributing

The following instructions have been tested with Python 3.7.4 on Mac OS (10.14.6).

Install in editable mode

First, define the variables for the paths we will use:

GIT=/path/to/where/you/put/repos
ENVS=/path/to/where/you/put/virtualenvs

Then clone the repo in your git directory $GIT:

cd $GIT
git clone https://github.com/rdturnermtl/mlpaper.git

Inside your virtual environments folder $ENVS, make the environment:

cd $ENVS
virtualenv mlpaper --python=python3.7
source $ENVS/mlpaper/bin/activate

Now we can install the pip dependencies. Move back into your git directory and run

cd $GIT/mlpaper
pip install -r requirements/base.txt
pip install -e .  # Install the package itself

Contributor tools

First, we need to setup some needed tools:

cd $ENVS
virtualenv mlpaper_tools --python=python3.7
source $ENVS/mlpaper_tools/bin/activate
pip install -r $GIT/mlpaper/requirements/tools.txt

To install the pre-commit hooks for contributing run (in the mlpaper_tools environment):

cd $GIT/mlpaper
pre-commit install

To rebuild the requirements, we can run:

cd $GIT/mlpaper

# Check if there any discrepancies in the .in files
pipreqs mlpaper/ --diff requirements/base.in
pipreqs tests/ --diff requirements/test.in
pipreqs demos/ --diff requirements/demo.in
pipreqs docs/ --diff requirements/docs.in

# Regenerate the .txt files from .in files
pip-compile-multi --no-upgrade

Generating the documentation

First setup the environment for building with Sphinx:

cd $ENVS
virtualenv mlpaper_docs --python=python3.7
source $ENVS/mlpaper_docs/bin/activate
pip install -r $GIT/mlpaper/requirements/docs.txt

Then we can do the build:

cd $GIT/mlpaper/docs
make all
open _build/html/index.html

Documentation will be available in all formats in Makefile. Use make html to only generate the HTML documentation.

Running the tests

The tests for this package can be run with:

cd $GIT/mlpaper
./local_test.sh

The script creates an environment using the requirements found in requirements/test.txt. A code coverage report will also be produced in $GIT/mlpaper/htmlcov/index.html.

Deployment

The wheel (tar ball) for deployment as a pip installable package can be built using the script:

cd $GIT/mlpaper/
./build_wheel.sh

Links

The source is hosted on GitHub.

The documentation is hosted at Read the Docs.

Installable from PyPI.

License

This project is licensed under the Apache 2 License - see the LICENSE file for details.