paobranco/UBL

Combining EvalRegressMetrics with caret

Opened this issue · 0 comments

Hello,
The caret and recipes libraries allow one to define customised evaluation metrics for models. I am trying to combine UBL::EvalRegressMetrics with the workflow using caret and recipes.

I was wondering if you (or anyone) has had any success with this, or would be able to comment on my attempts (using glmnet).

I define the following functions:

return.phi<-function(dat, control.parms=phiF.argsR){
#Inputs:
#dat: vector
#control.parms: matrix definining relevance surface
# output:
# phi: phi value for each point in da
dat_sort<-sort(dat)
idx<-rank(dat)
phi <- phi(dat_sort,control.parms = phiF.argsR)
phi <- phi[idx]
return(phi)
}

And model_stats to be evaluated in the form required by recipe (see https://topepo.github.io/caret/using-recipes-with-train.html)

model_stats <- function(data, lev = NULL, model = NULL) {

#Adds EvalRegressMetrics to defaultSummary
stats <- defaultSummary(data, lev = lev, model = model)
utils=UBL::EvalRegressMetrics(data$pred, data$obs, util.vals=data$utility )
c(FPhi = utils$FPhi, totUtil=utils$totUtil, MUtil=utils$MUtil, stats)
}

Then applying to Boston data

require(glmnet)
library(recipes)
library(caret)
library(UBL)

data(Boston, package = "MASS")
#Create utility surface
phiF.argsR <- phi.control(Boston[,tgt], method="extremes", extr.type="both")

#train/test split
tgt <- which(colnames(Boston) == "medv")
set.seed(101)
sp <- sample(1:nrow(Boston), as.integer(0.7*nrow(Boston)))
train <- Boston[sp,]
test <- Boston[-sp,]

#Add utility for each data point based on tgt
train <- train  %>% 
  mutate(utility = return.phi(medv, control.parms=phiF.argsR))

#Define recipe
tr_recipe <- recipe(medv ~ ., data = train) %>%
  add_role(utility, new_role="performance var") %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())
tr_recipe

head(train)


glmnet_control<-trainControl(method="cv", 
                             number=10, 
                             savePredictions = "all",
                             summaryFunction = model_stats)

#Fit the model using the recipe
set.seed(1)
glmnet_fit<-train(tr_recipe, data=train, 
                  method="glmnet",
                  trControl=glmnet_control,
                  metric=c("MUtil")
                  )
glmnet_fit$results

While the default measures such as RMSE vary for the different models tested, the EvalRegressMetrics are constant over models.

This has been the case for various models and datasets that I have tested.

I am uncertain if my problems lie in UBL, caret, or recipes. But if you do have any thoughts as to what might be generating this behaviour, it would be much appreciated.

Thanks
Iris