zachmayer/caretEnsemble

Response as factor or numeric ( rf and xgbTree )

Closed this issue · 4 comments

Hi,

I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric.

I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run.

The latter gives me warnings from RF since it requires the response to be a factor. So what is right?

Regards
Germayne

@zachmayer

sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables

example code:


control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary)
algorithmList <- c('glm','knn')

# set grids 
#rf_grid <- expand.grid()
xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1)

# methodList=algorithmList
models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss",
                    methodList = algorithmList,
                    tuneList = list(
                      et = caretModelSpec(method = "extraTrees", ntree = 1000),
                      rf = caretModelSpec(method = "rf", ntree = 1000),
                      xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid)
                                   )
                    )

thank you. :) This cleared my doubts