bmurauer/pipelinehelper

scoring with ROC_AUC

Closed this issue · 4 comments

Great work here,

and I do realize ROC_AUC doesn't work due to lack of forwarding of predict_proba etc.. but I was wondering do you have an ETA on this? Would love to use your class with roc_auc scoring.

Good catch, thanks! I should be able to fix that in a few days.

One other thing.

In your example you run nb_pipe with minmaxscaler in addition std and max scaler. Why?

from what I understand, std will run first and normalize the data around 1 while minmax runs just before the multinomialnb and adjust the scale between 0 and 1. Seems a little redundant or I am wrong?

I only ask as the following line of code was spit out from your example on a set of data I'm using.

# Tuning hyper-parameters for accuracy

Fitting 3 folds for each of 3738 candidates, totalling 11214 fits
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:    5.4s
[Parallel(n_jobs=-1)]: Done 434 tasks      | elapsed:   12.3s
[Parallel(n_jobs=-1)]: Done 784 tasks      | elapsed:   21.8s
[Parallel(n_jobs=-1)]: Done 1234 tasks      | elapsed:   39.7s
[Parallel(n_jobs=-1)]: Done 1784 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 2434 tasks      | elapsed:  2.3min
[Parallel(n_jobs=-1)]: Done 3184 tasks      | elapsed:  4.6min
[Parallel(n_jobs=-1)]: Done 4034 tasks      | elapsed:  9.5min
[Parallel(n_jobs=-1)]: Done 4984 tasks      | elapsed: 15.8min
[Parallel(n_jobs=-1)]: Done 6034 tasks      | elapsed: 71.6min
[Parallel(n_jobs=-1)]: Done 7184 tasks      | elapsed: 211.5min
[Parallel(n_jobs=-1)]: Done 8434 tasks      | elapsed: 967.1min
[Parallel(n_jobs=-1)]: Done 9784 tasks      | elapsed: 982.8min
[Parallel(n_jobs=-1)]: Done 11214 out of 11214 | elapsed: 998.4min finished
{'classifier__selected_model': ('nb_pipe', {'nb__fit_prior': True, 'nb__alpha': 0.1}), 'scaler__selected_model': ('std', {'with_mean': True, 'with_std': True})}
0.91085

or does it just ignore the minmaxscaler or in fact it just runs the minmaxscaler and mislabels it?

You are absolutely right, i will change the example to contain more useful pipeline elements.

Originally, i used the helper with two different scalers, where one needed dense data and one could work on sparse data. I wanted to show that the "densifyer' could be combined with the according scaler.

The Min-Max-Scaler is "required" because the NB does not run on negative values, but I agree that this example is misleading.

The output shows that the nb_pipe yielded the best results. However, only the parameters that were provided explicitly to this part of the pipeline (np__fit_prior and nb__alpha) will show up in the result list. This means that the MinMaxScaler will have used the parameters at its definition (line 27).

essentially it should be calling with default parameters that is to say MinMaxScaler() as no parameters were assigned?

Anyways thanks for the class. Quite useful for me, but more so once roc_auc scoring works.