Response as factor or numeric ( rf and xgbTree )

Question

Response as factor or numeric ( rf and xgbTree )

Closed this issue 7 years ago · 4 comments

Hi,

I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric.

I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run.

The latter gives me warnings from RF since it requires the response to be a factor. So what is right?

Regards
Germayne

Answer 1 · 2017-05-30T01:27:15.000Z

Please provide a reproducible example

…

On Mon, May 29, 2017 at 12:20 AM, Germayne ***@***.***> wrote: Hi, I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric. I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run. The latter gives me warnings from RF since it requires the response to be a factor. So what is right? Regards Germayne — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#229>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAjf1iGqKCHZ8l2MHoF0IkISml-fJTN_ks5r-keUgaJpZM4No4yj> .

Answer 2 · 2017-05-31T02:25:50.000Z

@zachmayer

sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables

example code:


control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary)
algorithmList <- c('glm','knn')

# set grids 
#rf_grid <- expand.grid()
xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1)

# methodList=algorithmList
models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss",
                    methodList = algorithmList,
                    tuneList = list(
                      et = caretModelSpec(method = "extraTrees", ntree = 1000),
                      rf = caretModelSpec(method = "rf", ntree = 1000),
                      xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid)
                                   )
                    )

Answer 3 · 2017-05-31T09:40:06.000Z

Try using a factor. Caret should convert it to numeric before passing the data to xgboost.

…

Sent from my iPhone

On May 30, 2017, at 10:25 PM, Germayne ***@***.***> wrote: @zachmayer sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables example code: control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary) algorithmList <- c('glm','knn') # set grids #rf_grid <- expand.grid() xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1) # methodList=algorithmList models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss", methodList = algorithmList, tuneList = list( et = caretModelSpec(method = "extraTrees", ntree = 1000), rf = caretModelSpec(method = "rf", ntree = 1000), xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid) ) ) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 4 · 2017-06-01T04:17:43.000Z

thank you. :) This cleared my doubts