Evovest/EvoTrees.jl

Feature Importance Doesn't Allow Feature Names

BenCurran98 opened this issue Β· 1 comments

Hi, I've noticed that with the recent PR that the feature importance function no longer allows you to enter feature names as an argument. Consider the example

using EvoTrees
using Statistics
using StatsBase: sample

# prepare a dataset
features = rand(Int(1.25e6), 100)
# features = rand(100, 10)
X = features
Y = rand(size(X, 1))
𝑖 = collect(1:size(X, 1))

# train-eval split
𝑖_sample = sample(𝑖, size(𝑖, 1), replace=false)
train_size = 0.8
𝑖_train = 𝑖_sample[1:floor(Int, train_size * size(𝑖, 1))]
𝑖_eval = 𝑖_sample[floor(Int, train_size * size(𝑖, 1))+1:end]

x_train, x_eval = X[𝑖_train, :], X[𝑖_eval, :]
y_train, y_eval = Y[𝑖_train], Y[𝑖_eval]

config = EvoTreeClassifier(
    loss=:linear, 
    nrounds=100, 
    nbins=100,
    lambda=0.5, 
    gamma=0.1, 
    eta=0.1,
    max_depth=6, 
    min_weight=1.0,
    rowsample=0.5, 
    colsample=1.0)

model = fit_evotree(config; x_train = training_features, y_train = training_labels, x_eval = validation_features, y_eval = validation_labels, print_every_n = 1)

display(importance(model))

which gives an output

4-element Vector{Pair{String, Float64}}:
 "feat_3" => 0.26565039212451985
 "feat_4" => 0.2589711676696925
 "feat_1" => 0.24700503862705744
 "feat_2" => 0.22837340157873026

Could you please correct this to add another method that allows feature names to be entered into importance as it was previously? Thanks

Hi, sorry that this change brought you some concerns.
The rartional for the change was that the feature names are now stored in the model itself.
Therefore, if you want to provide you own feature names, there would be 2 options:

  1. Provide fnames as a keyword argument to fit_evotree:
model = fit_evotree(params1; x_train, y_train, fnames = "my_feat_" .* string.(1:10));
  1. Update the model's info[:fnames] attribute at any time prior to calling Γ¬mportance:
fnames = "my_feat_" .* string.(1:10)
model.info[:fnames] = fnames
gain = importance(model)

Let me know if you have any issue with this new way of proceeding. I realize though that this change in v0.12.0 wasn't properly documented. I'll update the fit_evotree docs to reflect the changes.