Feature Importance Doesn't Allow Feature Names
BenCurran98 opened this issue Β· 1 comments
Hi, I've noticed that with the recent PR that the feature importance function no longer allows you to enter feature names as an argument. Consider the example
using EvoTrees
using Statistics
using StatsBase: sample
# prepare a dataset
features = rand(Int(1.25e6), 100)
# features = rand(100, 10)
X = features
Y = rand(size(X, 1))
π = collect(1:size(X, 1))
# train-eval split
π_sample = sample(π, size(π, 1), replace=false)
train_size = 0.8
π_train = π_sample[1:floor(Int, train_size * size(π, 1))]
π_eval = π_sample[floor(Int, train_size * size(π, 1))+1:end]
x_train, x_eval = X[π_train, :], X[π_eval, :]
y_train, y_eval = Y[π_train], Y[π_eval]
config = EvoTreeClassifier(
loss=:linear,
nrounds=100,
nbins=100,
lambda=0.5,
gamma=0.1,
eta=0.1,
max_depth=6,
min_weight=1.0,
rowsample=0.5,
colsample=1.0)
model = fit_evotree(config; x_train = training_features, y_train = training_labels, x_eval = validation_features, y_eval = validation_labels, print_every_n = 1)
display(importance(model))
which gives an output
4-element Vector{Pair{String, Float64}}:
"feat_3" => 0.26565039212451985
"feat_4" => 0.2589711676696925
"feat_1" => 0.24700503862705744
"feat_2" => 0.22837340157873026
Could you please correct this to add another method that allows feature names to be entered into importance
as it was previously? Thanks
Hi, sorry that this change brought you some concerns.
The rartional for the change was that the feature names are now stored in the model itself.
Therefore, if you want to provide you own feature names, there would be 2 options:
- Provide
fnames
as a keyword argument tofit_evotree
:
model = fit_evotree(params1; x_train, y_train, fnames = "my_feat_" .* string.(1:10));
- Update the model's
info[:fnames]
attribute at any time prior to callingìmportance
:
fnames = "my_feat_" .* string.(1:10)
model.info[:fnames] = fnames
gain = importance(model)
Let me know if you have any issue with this new way of proceeding. I realize though that this change in v0.12.0 wasn't properly documented. I'll update the fit_evotree
docs to reflect the changes.