iskandr/fancyimpute

[KNN] Warning: 57/1999 still missing after imputation, replacing with 0

Gaurav7296 opened this issue · 3 comments

from fancyimpute import KNN

knn = KNN(3)
df=pd.DataFrame(data=knn.fit_transform(df_train), columns= dataset.columns)

Tried this but all the missing values are filled with 0

Hello,

Please provide a complete reproducible example with toy data, as well as a description of what you expect the output to be.

there are two columns in this test dataset
x is the column where I want to impute missing values, y is a dummy column.

I expect an output df with imputed missing values, currently I am getting 0 at the place of missing values in output.
test.zip

from fancyimpute import KNN
dataset = pd.read_csv('test.csv')
dataset = dataset[['x']]
df_train = np.array(dataset[['x']].select_dtypes(include=[np.float]).values)

knn = KNN(3)

df=pd.DataFrame(data=knn.fit_transform(df_train), columns= dataset.columns, index= dataset.index)

You're only passing a single column to KNN, but that won't work. The way KNN works is by finding nearest neighbors to the missing rows in the other columns where elements of that row are not missing. You have no other columns so this becomes impossible and the algorithm reverts to a default of 0. Univariate feature imputation (when you only have one column) doesn't work with anything more complex than https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html