caret::train() returns wrong accuracy metrics with NNDM and bLOOCV
Closed this issue · 1 comments
Hi,
I noticed that caret::train()
returns non-sense accuracy statistics when used with NNDM and bLOOCV indices. For instance, using the example you provided in the Readme.md of the package, I got
> mod_NNDM$results
mtry min.node.size splitrule RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 2 5 variance 139.2695 NaN 139.2695 148.7611 NA 148.7611
MAE
value is right, but for some reason, RMSE
is wrong and equal to MAE
and Rsquared
is NaN
By inspecting your Readme.rmd
file on GitHub, I discovered that you didn't use the RMSE returned by caret::train()
but calculated it yourself. So I would suggest making that clear in the Readme documentation of the package by showing the code to avoid confusion.
I also wonder how this issue affects the tuning parameter selection when using caret::train()
, as it seems that the model will be tuned to minimize the MAE rather than the RMSE.
Maybe it is better to tune the model without caret
?
Hi @AramburuMerlos thanks for filing this. I've updated the README by making visible the custom score computation and added a warning to make it clear. The issue is that when using this custom configuration, caret computes the score in the out-of-sample observation (i.e. just one data point each time) and then averages them. This is why RMSE = MAE, and R2 cannot be computed with just one observation.