Response as factor or numeric ( rf and xgbTree )
Closed this issue · 4 comments
Hi,
I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric.
I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run.
The latter gives me warnings from RF since it requires the response to be a factor. So what is right?
Regards
Germayne
sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables
example code:
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary)
algorithmList <- c('glm','knn')
# set grids
#rf_grid <- expand.grid()
xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1)
# methodList=algorithmList
models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss",
methodList = algorithmList,
tuneList = list(
et = caretModelSpec(method = "extraTrees", ntree = 1000),
rf = caretModelSpec(method = "rf", ntree = 1000),
xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid)
)
)
thank you. :) This cleared my doubts