`$evaluation` results from `rf_evaluate()` sensitive to order of character vector passed to `metrics=` argument
Closed this issue · 2 comments
tl;dr
Incorrect values of evaluation metrics can be given depending on the order of the character vector passed as the metrics=
argument to rf_evaluate()
. Happens because of a forced renaming of columns based on a hard-coded order. Can be fixed with a small patch (let me know if you want a PR!)
Some details
I came across this issue while working to spatially cross-validate a non-spatial model with a binomial response. I was using AUC and R^2 as my evaluation metrics, and wanted to add RMSE. I did so, then my AUC metric was appearing in the row labeled RMSE and vice versa. At first I was shocked at how low my AUC was! Then I realized I was working with the same data (and same random seed) so something else must have been going on.
In looking closely at the model object updated with the $evaluation
data, I see that the model$evaluation$per.fold
data.frame appears to be correct, but that the model$evaluation$per.fold.long
and model$evaluation$per.model
appear to be wrong. The rows in model$evaluation$per.fold.long
and model$evaluation$per.model
that correspond to the "full" model also look to be correct, with just the rows representing "training" and "testing" having the metrics and values mislabeled.
All of this suggest to me that, in the rf_evaluate()
code, evaluation.df
is correct, performance.full
is correct, and there's an issue with performance.training
and performance.testing
.
I think the problem is in the renaming of the columns here:
Lines 403 to 406 in 9bc7edd
If the order of the metrics
character vector happens to be the same order of the metrics in the columns of performance.testing
and performance.training
, then there is no problem. But if the order is different in metrics
, then the assignment coerces the column names to the order provided by the metrics=
argument (c(metrics, "model")
).
Possible fix
This approach appears to work: remove the current approach to defining the colnames()
and instead substitute the .training
part of the column names in performance.training
or the .testing
part of the column name in performance.testing
to be blank, then reassign the performance.training
and performance.testing
objects as you do with performance.full
. That is:
performance.training <- dplyr::select(
evaluation.df,
dplyr::contains("training")
)
performance.training[, 1] <- NULL
colnames(performance.training) <- gsub(pattern = ".*\\.", replacement = "", x = colnames(performance.training))
performance.training$model <- "Training"
performance.training <- performance.training[, c(metrics, "model")]
performance.testing <- dplyr::select(
evaluation.df,
dplyr::contains("testing")
)
performance.testing[, 1] <- NULL
colnames(performance.testing) <- gsub(pattern = ".*\\.", replacement = "", x = colnames(performance.testing))
performance.testing$model <- "Testing"
performance.testing <- performance.testing[, c(metrics, "model")]
Minimum working example
This initial block is copied exactly from your excellent tutorial:
library(spatialRF)
library(dplyr)
#loading training data and distance matrix from the package
data(plant_richness_df)
data(distance_matrix)
#names of the response variable and the predictors
dependent.variable.name <- "richness_species_vascular"
predictor.variable.names <- colnames(plant_richness_df)[5:21]
#coordinates of the cases
xy <- plant_richness_df[, c("x", "y")]
#distance matrix
distance.matrix <- distance_matrix
#distance thresholds (same units as distance_matrix)
distance.thresholds <- c(0, 1000, 2000, 4000, 8000)
#random seed for reproducibility
random.seed <- 1
plant_richness_df$response_binomial <- ifelse(
plant_richness_df$richness_species_vascular > 5000,
1,
0
)
model.non.spatial <- spatialRF::rf(
data = plant_richness_df,
dependent.variable.name = "response_binomial",
predictor.variable.names = predictor.variable.names,
distance.matrix = distance.matrix,
distance.thresholds = distance.thresholds,
seed = random.seed,
verbose = FALSE
)
The next block shows the issue by running rf_evaluate()
in two different ways, with only the order of the metrics=
argument switched.
model.non.spatial.2 <- model.non.spatial
model.non.spatial <- spatialRF::rf_evaluate(
model.non.spatial,
xy = xy,
metrics = c("auc", "rmse"),
verbose = FALSE
)
model.non.spatial.2 <- spatialRF::rf_evaluate(
model.non.spatial.2,
xy = xy,
metrics = c("rmse", "auc"),
verbose = FALSE
)
The output from print_evaluation()
shows the problem. The values for AUC are switched with the values from RMSE!
> spatialRF::print_evaluation(model.non.spatial)
Spatial evaluation
- Training fraction: 0.75
- Spatial folds: 29
Metric Median MAD Minimum Maximum
auc 0.378 0.046 0.288 0.504
rmse 0.882 0.044 0.708 0.960
> spatialRF::print_evaluation(model.non.spatial.2)
Spatial evaluation
- Training fraction: 0.75
- Spatial folds: 29
Metric Median MAD Minimum Maximum
auc 0.882 0.044 0.708 0.960
rmse 0.378 0.046 0.288 0.504
Troubleshooting code from rf_evaluate()
that diagnosed problem to the colnames()
assignment
Continue this block using the code from the minimum working example above.
evaluation.df <- model.non.spatial$evaluation$per.fold
metrics <- c("auc", "rmse") # The order I specified in the "bad" rf_evaluate() call
performance.training <- dplyr::select(
evaluation.df,
dplyr::contains("training")
)
performance.training[, 1] <- NULL
performance.training$model <- "Training"
performance.testing <- dplyr::select(
evaluation.df,
dplyr::contains("testing")
)
performance.testing[, 1] <- NULL
performance.testing$model <- "Testing"
r.squared <- model.non.spatial$performance$r.squared
pseudo.r.squared <- model.non.spatial$performance$pseudo.r.squared
rmse <- model.non.spatial$performance$rmse
nrmse <- model.non.spatial$performance$nrmse
auc <- model.non.spatial$performance$auc
#check lengths
if(length(r.squared) == 0){
r.squared <- NA
}
if(length(pseudo.r.squared) == 0){
pseudo.r.squared <- NA
}
if(length(rmse) == 0){
rmse <- NA
}
if(length(nrmse) == 0){
nrmse <- NA
}
if(length(auc) == 0){
auc <- NA
}
#full model
performance.full <- data.frame(
r.squared = r.squared,
pseudo.r.squared = pseudo.r.squared,
rmse = rmse,
nrmse = nrmse,
auc = auc,
model = "Full"
)
performance.full <- performance.full[, c(metrics, "model")]
colnames(performance.training) <- colnames(performance.testing) <- colnames(performance.full) <- c(
metrics,
"model"
)
# metrics are now mislabeled
performance.training
performance.testing
Hi,
Thank you very much for pointing out this critical problem!
However, I noticed it during my work on the package during my holidays and submitted a fix to the development version a few weeks ago. I have been re-testing it today just to be sure, and it it seems to be working as expected.
If you are going to try it, please keep in mind that the development version is still a work in progress, and some things are surely broken.
Thanks again for keeping those issues coming. I truly appreciate it!
Cheers,
Blas
[edit]: I just submitted a new version of rf_evaluate() fixing an issue that was too related to the order of the metrics!
Great news! I think we can safely close this issue then.