WillKoehrsen/feature-selector

identify_collinear get wrong results when exsit features with 100% missing values

Opened this issue · 0 comments

There are a situation,if my data have a feature with 100% missing values, or threshold like 98% missing values, call identify_collinear() will get more features with a correlation magnitude greater than the correlation_threshold.

I cheaked the result of pd.DataFrame.corr(), there were high correlation between some features and the feature with 98% missing values. So when call identify_all(),we will remove more features. We should removed the features with greater than threshold mising values at first, and then identify collinear. May be there are some better strategys.