ml-tooling/best-of-ml-python

Add project: miceforest

samFarrellDay opened this issue · 1 comments

Project details:
Missing data imputation is a widely used method for dealing with missing data in machine learning and statistical workflows. miceforest aims to provide extremely accurate imputations using lightgbm, while being as lightweight and fast as possible. This package can:

  1. Impute multiple datasets, so the user can perform Multiple Imputation by Chained Equations (MICE)
  2. Plot the imputed correlations, distributions, feature importance, and more
  3. Train models on 1 dataset, and impute a different dataset (useful for production environments)
  4. Can be GPU accelerated through the lightgbm api.
  5. Can impute data in place, which means the dataset never has to be copied. Useful for huge datasets.

Additional context:
I am wondering if Missing Data Imputation should be it's own category - it is very often used in machine learning, especially predictive modeling. There is another missing data imputation project already on here, fancyimpute. What do you think?

Thanks for the suggestion. I added miceforest to the tabular data section. In case there are even more projects coming in related to Missing Data Imputation I will consider to add a new category for this.