carlesmila/NNDM

caret::train() returns wrong accuracy metrics with NNDM and bLOOCV

Closed this issue · 1 comments

Hi,

I noticed that caret::train() returns non-sense accuracy statistics when used with NNDM and bLOOCV indices. For instance, using the example you provided in the Readme.md of the package, I got

> mod_NNDM$results
  mtry min.node.size splitrule     RMSE Rsquared      MAE   RMSESD RsquaredSD    MAESD
1    2             5  variance 139.2695      NaN 139.2695 148.7611         NA 148.7611

MAE value is right, but for some reason, RMSE is wrong and equal to MAE and Rsquared is NaN

By inspecting your Readme.rmd file on GitHub, I discovered that you didn't use the RMSE returned by caret::train() but calculated it yourself. So I would suggest making that clear in the Readme documentation of the package by showing the code to avoid confusion.

I also wonder how this issue affects the tuning parameter selection when using caret::train(), as it seems that the model will be tuned to minimize the MAE rather than the RMSE.

Maybe it is better to tune the model without caret?

Hi @AramburuMerlos thanks for filing this. I've updated the README by making visible the custom score computation and added a warning to make it clear. The issue is that when using this custom configuration, caret computes the score in the out-of-sample observation (i.e. just one data point each time) and then averages them. This is why RMSE = MAE, and R2 cannot be computed with just one observation.