Getting a warning when using multiple features
affans opened this issue · 3 comments
using EvoTrees
using EvoTrees: sigmoid, logit
using MLJBase
using RDatasets
iris = dataset("datasets", "iris")
iris[!, :is_setosa] = iris[!, :Species] .== "setosa"
features = setdiff(names(iris), ["Species", "is_setosa"])
Y, X, _ = unpack(iris, ==(:is_setosa), in(Symbol.(features)), colname -> true)
train, test = partition(eachindex(Y), 0.7, shuffle=true); # 70:30 split
tree_model = EvoTreeClassifier(
loss=:linear, metric=:mse,
nrounds=100, nbins = 100,
λ = 0.5, γ=0.1, η=0.1,
max_depth = 6, min_weight = 1.0,
rowsample=0.5, colsample=1.0)
mach = machine(tree_model, X, Y)
results in this warning
Warning: The number and/or types of data arguments do not match what the specified model supports. Suppress this type check by specifying `scitype_check_level=0`.
│
│ Run `@doc EvoTreeClassifier{Float64, EvoTrees.Softmax, Int64}` to learn more about your model's requirements.
│ Commonly, but non exclusively, supervised models are constructed using the syntax `machine(model, X, y)` or `machine(model, X, y, w)` while most other models are constructed with `machine(model, X)`. Here `X` are features, `y` a target, and `w` sample or class weights.
Why could this be happening?
The warning message returned refers to the constraint for the target variable in classification models used with the MLJ interface to be of type: target = AbstractVector{<:MLJModelInterface.Finite}
.
To do so, you can use the categorical
function to transform in the required format: Y = categorical(Y)
Also note that for a classifier, the loss should be softmax
, not linear, although this loss
argument can be ignored as the constructor for EvoTreeClassifier
the usage of softmax
loss. Also, the metric should be mlogloss
(rather than mse
).
To sum up, the following should work as intended:
using EvoTrees
using EvoTrees: sigmoid, logit
using MLJBase
using RDatasets
iris = dataset("datasets", "iris")
iris[!, :is_setosa] = iris[!, :Species] .== "setosa"
features = setdiff(names(iris), ["Species", "is_setosa"])
Y, X, _ = unpack(iris, ==(:is_setosa), in(Symbol.(features)), colname -> true)
Y_cat = categorical(Y)
train, test = partition(eachindex(Y), 0.7, shuffle=true); # 70:30 split
tree_model = EvoTreeClassifier(
loss=:softmax, metric=:mlogloss,
nrounds=100, nbins = 100,
λ = 0.5, γ=0.1, η=0.1,
max_depth = 6, min_weight = 1.0,
rowsample=0.5, colsample=1.0, rng=123)
mach = machine(tree_model, X, Y_cat)
fit!(mach, rows=train, verbosity=1)
# predict on train data
pred_prob = predict(mach, X)
pred_mode = predict_mode(mach, X)
What's the difference between predict
and predict_mode
?
I think if you run @jeremiedb's code you will see the difference.
In MLJ a model that can make probabilistic predictions does so by default. So, for EvoTreeClassifier
, for example, predict
will output a vector yhat
of UnivariateFinite
objects. To get actual point predictions (classes) you can do predict_mode(Xnew)
, which is the same as mode.(predict(Xnew))
.
(For future reference, this is probably a question for MLJ rather than EvoTrees.jl).