iskandr/fancyimpute

Memory Error using KNN impute on large data

MariaSahakyan opened this issue · 1 comments

Hi. I have issues with using KNN imputation on my dataset with dimensions 356.000x247.
I was not considering this being that large though, however, every time I run the code, I am getting a memory error, even on HPC cluster.
What would you suggest to do in this case?

It's doing a kNN so that means allocating a giant matrix that's 356K by 356K:

https://github.com/iskandr/knnimpute/blob/master/knnimpute/common.py#L39

Which is probably larger than how much ram you have.

You can try using orientation='columns' which will make a 247 by 247 matrix instead but might be a lot worse, performance-wise. OR (my preferred solution) switch to another imputer - I recommend IterativeImputer!