SMOTE, ADASYN etc

Question

SMOTE, ADASYN etc

vm885032 opened this issue a year ago · 1 comments

Hi Sole,

Suppose I have a dataset that contains one-hot encoded features and continuous features and is imbalanced with respect to the target class. I am concerned where blindly applying SMOTE() in the pipeline will create synthetic points around the one hot encoded data making the model to learn unrealistic values. Is there a way to go about this?

Thanks
Vivek

Answer 1 · 2023-08-14T07:55:09.000Z

Hi @vm885032

I understand that you want is that the synthetic minority examples have values of 0 and 1 for the OHE features.

I don't think there is a SMOTE procedure that achieves that if you also have numerical features in the data. If you had only categorical, then you could use SMOTE-N. But with a mix, I don't think it is possible with the currently available methods.