sklearn.impute.KNNImputer

Question

sklearn.impute.KNNImputer

jamesmyatt opened this issue 4 years ago · 3 comments

jamesmyatt commented 4 years ago

What's the relationship between fancyimpute.KNN and sklearn.impute.KNNImputer (since scikit-learn v0.22)?

Naively, I assume they do the same thing, and I think they all trace-back to Troyanskaya etal, 2001.

Answer 1 · 2021-01-16T05:40:02.000Z

I believe they were developed totally independently.

Ours was inspired by this: https://www.rdocumentation.org/packages/imputation/versions/2.0.3/topics/kNNImpute

Imputation using k-nearest neighbors. For each record, identify missing features. For each missing feature find the k nearest neighbors which have that feature. Impute the missing value using the imputation function on the k-length vector of values found from the neighbors. Imputation function ends up being the mean.

The scikit-learn one has a really similar sounding description:

Each sample’s missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set. Two samples are close if the features that neither is missing are close.

So, I would think these are extremely similar but may give slightly different results and have different runtimes.

Answer 2 · 2021-01-16T14:36:14.000Z

Thanks @sergeyf .

Doing some digging in the scikit-learn github, the KNNImputer was added in scikit-learn/scikit-learn#12852, but that completes the work of scikit-learn/scikit-learn#9212, which in turn takes inspiration from the R-package impute from Bioconductor.

So it sounds like the implementations are totally independent.

Can you tell from the descriptions how the available options compare?

Answer 3 · 2021-01-17T02:07:34.000Z

Not really :)