curso-r/treesnip

Quantile regression with lightgbm not possible

Closed this issue · 5 comments

Quantile regression in lgb.train() requires objective = 'quantile', which does not work when we use {treesnip} because objective supplied in set_engine() is not respected.

library(magrittr)
# boost_tree
model <- parsnip::boost_tree(mtry = NULL, trees = 5, mode = 'regression', learn_rate = .1)
library(treesnip)
model <- parsnip::set_engine(model, "lightgbm", objective = "quantile", metric = "l2", alpha = 0.1)
set.seed(4)
data <- matrix(rnorm(1000), ncol = 4) %>%
  tibble::as_tibble() %>%
  dplyr::mutate(y = sample(1000/4))
fit <- model %>%
  parsnip::fit(y ~., data = data)

predict(fit, data)

The problem is that objective and others are overwritten in train_lightgbm(). I think arguments such as objective in ... in train_lgb() should always have precedence in this function over the ones derived form other variables.

What was the rationale behind giving others lower precedence in https://github.com/curso-r/treesnip/blob/master/R/lightgbm.R#L237? I think other should have presedence, we could achieve this by replacing the above line with arg_list <- modifyList(arg_list, others) and remove the merging of the two later in the code. I can make a PR if you want.

I don't think we have a strong argument for giving others a lower precedence. I think we followed what parsnip does for xgboost, see: https://github.com/tidymodels/parsnip/blob/master/R/boost_tree.R#L382-L384

I agree with your suggestion to give others a higher precedence, so you can use whichever objective you want. We might need to implement a https://github.com/tidymodels/parsnip/blob/9333a0d08764c28eb12337e0bc95160a20462356/R/predict_quantile.R#L9 too.

Well I have the apprehension that Max had a reason to do it that way... Maybe he wanted to make sure you can't override things that are not engine-specific in set_engine(). Probably the same problem exist when you want to use another objective in {parsnip} with xgboost than 'regression' or 'classification'? There are quite a number of objectives in xgboost.

The quantile method sounds very cool too 🎉. I am not familiar enough with parsnip though to contribute that now unfortunately.

Two things:

  • I don't think implementing a quantile method as you suggested in #24 (comment) will work here because lightgbm's quantile regression is different from other algorithms that implment quantile methods. It's not an argument to predict(lightgbm_model, ...), but the quantile needs to be set on training.
  • For consistency, I opened an issue in https://github.com/tidyverse/parsnip to discuss the general handling of this: tidymodels/parsnip#403

Update: My suggested changes were incorporated into parsnip for xgboost in tidymodels/parsnip#403 and @topepo said he'd also submit a PR for this here.

I've reproduced @topepo's logic to allow pass objective to set_engine() for both catboost and lightgbm! Thank you!
tidymodels/parsnip#403