XGBoost Feature Interactions & Importance
Xgbfi is a XGBoost model dump parser, which ranks features as well as feature interactions by different metrics.
Xgbfir - Python porting
- Gain: Total gain of each feature or feature interaction
- FScore: Amount of possible splits taken on a feature or feature interaction
- wFScore: Amount of possible splits taken on a feature or feature interaction weighted by the probability of the splits to take place
- Average wFScore: wFScore divided by FScore
- Average Gain: Gain divided by FScore
- Expected Gain: Total gain of each feature or feature interaction weighted by the probability to gather the gain
- Average Tree Index
- Average Tree Depth
- Leaf Statistics
- Split Value Histograms
Example:
[mono] XgbFeatureInteractions.exe [-help|options]
a) Creating a feature map (fmap)
def create_feature_map(fmap_filename, features):
"""
features: enumerable of feature names
"""
outfile = open(fmap_filename, 'w')
for i, feat in enumerate(features):
outfile.write('{0}\t{1}\tq\n'.format(i, feat))
outfile.close()
create_feature_map('xgb.fmap', features)
b) Dumping a XGBoost model
gbdt.dump_model('xgb.dump',fmap='xgb.fmap', with_stats=True)
c) Editing Parameters in XgbFeatureInteractions.exe.config
<setting name="XgbModelFile" serializeAs="String">
<value>xgb.dump</value>
</setting>
d) Running [mono] XgbFeatureInteractions.exe without cmd line parameters