iskandr/fancyimpute

Different methods for a training set and a test set?

Closed this issue · 4 comments

Hi,

I've been looking at the MICE class and I am wondering if there are different methods for the training and testing phases. Usually a class has a fit_transform method for the training phase and a transform method for the testing phase. From what I understood the complete method performs MICE for a training set (with fit and predict methods called for model). Therefore it shouldn't be used for the testing set. Is it possible to have access to the fitted predictors so that you can call the predict method on the testing set?

I may have gotten something wrong, any help would be really appreciated.

Best regards,
Johann

Hi Johann,

Thanks for your interest. MICE as is doesn't have an inductive mode, but there's on-going work to get a MICE-like imputer into scikit-learn to be released with the upcoming v0.20, and it has all the expected sklearn API - including fit, transform, fit_transform. I don't know exactly whenv 0.20 will be released, but when it is, MICE will be removed from fancyimpute.

You can find it in sklearn's master here: http://scikit-learn.org/dev/modules/generated/sklearn.impute.ChainedImputer.html

But it will soon be renamed to IterativeImputer.

Hi Sergey,

Thank you for your quick reply. I'm glad to see that sklearn.impute module will have more tools to perform imputation in the near future and that you partake in it.
Just to make sure: have none of the classes implemented in fancyimpute an implemented inductive mode currently (it may not make sense for some tools to have one, I don't know very well some of those tools)?

Thanks in advance.

Edit: looks like BiScaler has an inductive mode.

You are correct - we didn't aim for inductive use of fancyimpute as that is not generally how matrix-filling is treated in the literature. In retrospect, that was an oversight. Unfortunately,fancyimpute is currently in "just barely supported" mode instead of "active development" mode. There are no plans from us to do the necessary overhaul, but we would welcome and support PRs.

Looks like the sklearn method is being put off until another version. So this won't be happening until about a year from now I guess.