SMOTE, ADASYN etc
vm885032 opened this issue · 1 comments
Hi Sole,
Suppose I have a dataset that contains one-hot encoded features and continuous features and is imbalanced with respect to the target class. I am concerned where blindly applying SMOTE() in the pipeline will create synthetic points around the one hot encoded data making the model to learn unrealistic values. Is there a way to go about this?
Thanks
Vivek
Hi @vm885032
I understand that you want is that the synthetic minority examples have values of 0 and 1 for the OHE features.
I don't think there is a SMOTE procedure that achieves that if you also have numerical features in the data. If you had only categorical, then you could use SMOTE-N. But with a mix, I don't think it is possible with the currently available methods.