repository for Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance
UPD: 6.12
This part is our classification/regression model.
bash compile.sh
python3 example.py
UnbiasedBoost
losstool
: 'logloss' for classification or 'MSE' for regressionn_est
: number of estimatorsmin_leaf
: minimum number of instances at leaf nodelr
: learning raten_leaf
: number of leaves per tree.
.fit
df
: dataframe. categorical features should input as int, and numerical feature should input as float.label
testset
: tuple(metric, df_test, df_label) (see example.py)return_pred
: whether to return prediction of testset
.predict
df
: dataframe.
.calc_self_imp
This method will return the importance in training stage, different from our post-hoc method.
This part is a post-hoc method.
Support XGBoost LightGBM
UnbiasedGain.calc_gain(model, dataT, labelT, dataV, labelV, losstool)
seed = 998244353
model = (lgb.LGBMRegressor if task=='regression' else lgb.LGBMClassifier)(random_state=seed, learning_rate=1, n_estimators=5)
model.fit(X_train, y_train.values)
pred = model.predict(X_test) if task=='regression' else model.predict_proba(X_test)[:,1]
print(model.feature_importances_)
losstool = UnbiasedGain.MSE_tool() if task=='regression' else UnbiasedGain.logloss_tool()
UnbiasedGain.calc_gain(model, X_train, y_train, X_test, y_test, losstool)