sklearn.impute.KNNImputer
jamesmyatt opened this issue · 3 comments
What's the relationship between fancyimpute.KNN
and sklearn.impute.KNNImputer
(since scikit-learn v0.22)?
Naively, I assume they do the same thing, and I think they all trace-back to Troyanskaya etal, 2001.
I believe they were developed totally independently.
Ours was inspired by this: https://www.rdocumentation.org/packages/imputation/versions/2.0.3/topics/kNNImpute
Imputation using k-nearest neighbors. For each record, identify missing features. For each missing feature find the k nearest neighbors which have that feature. Impute the missing value using the imputation function on the k-length vector of values found from the neighbors.
Imputation function ends up being the mean.
The scikit-learn one has a really similar sounding description:
Each sample’s missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set. Two samples are close if the features that neither is missing are close.
So, I would think these are extremely similar but may give slightly different results and have different runtimes.
Thanks @sergeyf .
Doing some digging in the scikit-learn github, the KNNImputer was added in scikit-learn/scikit-learn#12852, but that completes the work of scikit-learn/scikit-learn#9212, which in turn takes inspiration from the R-package impute from Bioconductor.
So it sounds like the implementations are totally independent.
Can you tell from the descriptions how the available options compare?
Not really :)