egouldo/ManyEcoEvo

double-checking calculation of out-of-sample estimates pipeline

Closed this issue · 1 comments

Hi @itchyshin , I am double-checking the analysis pipeline for calculating the out-of-sample yi estimates for use in our meta-analysis.

We standardise the out-of-sample estimates for each analysis by:

Function Z_VZ_preds() takes the out-of-sample estimate, yi standard error of the yi estimate yi_se, as well as the relevant population mean and sd for the response variable used in that analysis:

Z <- (yi - mu_p) / sd_p
VZ <- yi_se / sd_p

And Z_VZ_preds() is applied to all analyses in the function pred_to_Z():

pred_to_Z <- function(back_transformed_data,
params,
dataset) {
# TODO test: str_detect(response_variable_name) in param table
match.arg(dataset, choices = c("blue tit", "eucalyptus"), several.ok = FALSE)
if (rlang::is_na(params) | rlang::is_na(back_transformed_data)) {
cli::cli_warn("Argument {.arg params} or {.arg back_transformed_data} is {.val {NA}}. Returning {.val {NA}} for standardized predictions.")
return(NA)
}
if (dataset == "blue tit") {
if (!pointblank::test_col_exists(back_transformed_data,
columns = c("estimate", "se.fit")
)) {
cli::cli_warn("Blue tit Dataframe {.arg back_transformed_data} is missing columns {.val estimate} and/or {.val se.fit}. Returning {.val {NA}} for standardized predictions.")
return(NA)
}
sd_p <- params %>%
dplyr::filter(parameter == "sd") %>%
purrr::pluck("value")
mu_p <- params %>%
dplyr::filter(parameter == "mean") %>%
purrr::pluck("value")
standardised_preds <-
back_transformed_data %>%
mutate(
res = map2(
estimate,
se.fit,
~ Z_VZ_preds(
yi = .x,
yi_se = .y,
sd_p = sd_p,
mu_p = mu_p
)
),
.keep = c("unused")
) %>%
hoist(res, "Z", "VZ") %>%
select(-starts_with("ci."))
} else {
if (!pointblank::test_col_exists(back_transformed_data,
columns = c("fit", "se.fit")
)) {
cli::cli_warn("Eucalyptus Dataframe {.arg back_transformed_data} is missing columns {.val fit} and/or {.val se.fit}. Returning {.val {NA}} for standardized predictions.") # TODO remove hard-coding, generalise
return(NA)
}
sd_p <- params %>%
filter(parameter == "sd") %>%
pluck("value")
mu_p <- params %>%
filter(parameter == "mean") %>%
pluck("value")
standardised_preds <-
back_transformed_data %>%
mutate(
res = map2(fit, se.fit, ~ Z_VZ_preds(.x, .y, sd_p, mu_p)),
.keep = c("unused")
) %>%
hoist(res, "Z", "VZ") %>%
select(-starts_with("ci."))
}
return(standardised_preds)
}

Note that within pred_to_Z(), the standardisation of the estimates is performed on back_transformed_data, which contains data on the response_scale, not on the link-scale or otherwise. The function convert_predictions() applies the relevant *_back() transformation functions depending on what transformation is needed.

Note that the *_back transformation functions take the out of sample estimate, and the standard error, generating a new distribution on the appropriate scale with the relevant _back() transformation function. From this, we take the yi <- mean(distribution) and the yi_est <- sd(distribution). For example:

cube_back <- function(beta, se, sim) {
simulated <- rnorm(sim, beta, se)
original <- pracma::nthroot(simulated, n = 3) %>% # inverse of x^3, use non-base to allow for -ve numbers
na.omit()
m_est <- mean(original)
se_est <- sd(original)
quantiles <- quantile(original, c(0.025, 0.975), na.rm = TRUE)
set <- data.frame(mean_origin = m_est, se_origin = se_est, lower = quantiles[[1]], upper = quantiles[[2]])
if (flatten_dbl(set) %>%
map_lgl(.f = ~ is.na(.x) | is.nan(.x) | is.infinite(.x)) %>%
any()) {
cli::cli_alert_danger("{.val NA}, {.val Inf} or {.val NaN} returned during back-transformation of effect sizes and standard errors.")
}
cli::cli_alert_success("Applied back-transformation for cubed effect sizes")
return(set)
}
.

My concern is that actually we are parsing the standard deviation (computed with every _back() function) to the function Z_VZ_preds() when actually, we need the standard error to calculate the standardised SE estimate, which is a linear transformation VZ <- yi_se / sd_p.

met with @itchyshin this morning. Confirmed that sd has been incorrectly assigned as se in this pipeline.

  • Calculate SE and align to se_est in all _back() functions.