Different methods for a training set and a test set?
Closed this issue · 4 comments
Hi,
I've been looking at the MICE class and I am wondering if there are different methods for the training and testing phases. Usually a class has a fit_transform
method for the training phase and a transform
method for the testing phase. From what I understood the complete
method performs MICE for a training set (with fit
and predict
methods called for model
). Therefore it shouldn't be used for the testing set. Is it possible to have access to the fitted predictors so that you can call the predict
method on the testing set?
I may have gotten something wrong, any help would be really appreciated.
Best regards,
Johann
Hi Johann,
Thanks for your interest. MICE as is doesn't have an inductive mode, but there's on-going work to get a MICE-like imputer into scikit-learn to be released with the upcoming v0.20, and it has all the expected sklearn API - including fit
, transform
, fit_transform
. I don't know exactly whenv 0.20 will be released, but when it is, MICE will be removed from fancyimpute
.
You can find it in sklearn's master here: http://scikit-learn.org/dev/modules/generated/sklearn.impute.ChainedImputer.html
But it will soon be renamed to IterativeImputer
.
Hi Sergey,
Thank you for your quick reply. I'm glad to see that sklearn.impute module will have more tools to perform imputation in the near future and that you partake in it.
Just to make sure: have none of the classes implemented in fancyimpute an implemented inductive mode currently (it may not make sense for some tools to have one, I don't know very well some of those tools)?
Thanks in advance.
Edit: looks like BiScaler has an inductive mode.
You are correct - we didn't aim for inductive use of fancyimpute
as that is not generally how matrix-filling is treated in the literature. In retrospect, that was an oversight. Unfortunately,fancyimpute
is currently in "just barely supported" mode instead of "active development" mode. There are no plans from us to do the necessary overhaul, but we would welcome and support PRs.
Looks like the sklearn method is being put off until another version. So this won't be happening until about a year from now I guess.