Far0n/xgbfi

Feature's Own Interaction

yilisg opened this issue · 4 comments

Hi Mathias,

Not sure where to ask this question but unfortunately xgbfi doesn't have much documentation (yet).

So my understanding is that f10|f40 means there is notable interactions between feature 10 and feature 40, and I could potentially "help" the classifier by adding new features such as f10 - f40 or f10 * f40. But for a problem I am working on now, I see the top interactions are f10|f10 and f10|f10|f10 and f10|f10|f10|f10... what does that mean? Should I create a new feature called f10 * f10 or should I create an identical f10 as a new feature so that it could be split at more than 1 node?

Appreciate your clarification!

Li

I'll throw in my (less knowledgeable than Mathias') view on this.

Same feature "interactions" are just algorithm discretizing the feature. So, in other words, if x > 10, and x > 25, and x < 100, then y = 7. That would be a 3x interaction. It's doing this because the response has a strong dependence on this feature and/or there is a lot of non-linearity.

So the question - what should you do with this information? Well, in situations like this, my leans towards what might be getting missed because of the dominance of this feature. So, The first thing I'd do is remove that feature and run the model again to see what else floats to the surface in terms of importance. Then I might stack the output (predictions) of that model as a new feature.

Far0n commented

Thx Walt, that covers the most. :) Additionally: adding an identical f10 would just bring in redundancy, which is not helpful.

Thank you Walt and Mathias. I guess given the dominance of the feature, I could also try to add its transformations/binning etc. Also, FWIW, in my experimentation adding an additional f10 didn't help but adding f10 * f10, f10 * f10 * f10 did improve the model.

Far0n commented

Thx for the feedback @yilisg