/feature_selection

This repo is to house all the algorithms related to feature selection

Primary LanguageJupyter Notebook

Feature selection algorithm

This repository is to house all the algorithms related to feature selection. Feature selection have many benefits suchs as improve training/inference speed, reduce chance of overfitting and reduce chance of upstream data outages

Summary

Using of random forest and gradient boosting tree feature_importance is a good baseline and features can be further selected using more advance algorithm like (FCQ F-statistics/pearson-correlation)

Algorithms

  1. minimum redundancy Maximum relevances (mRMR)

References

minimum redundancy Maximum relevances (mRMR)

  1. https://towardsdatascience.com/mrmr-explained-exactly-how-you-wished-someone-explained-to-you-9cf4ed27458b
  2. https://eng.uber.com/optimal-feature-discovery-ml/

Papers

mRMR

  1. https://arxiv.org/pdf/1908.05376.pdf - Uber mRMR