Hi @itchyshin , I am double-checking the analysis pipeline for calculating the out-of-sample yi estimates for use in our meta-analysis.
We standardise the out-of-sample estimates for each analysis by:
Function Z_VZ_preds()
takes the out-of-sample estimate, yi
standard error of the yi
estimate yi_se
, as well as the relevant population mean and sd for the response variable used in that analysis:
|
Z <- (yi - mu_p) / sd_p |
|
VZ <- yi_se / sd_p |
And Z_VZ_preds()
is applied to all analyses in the function pred_to_Z()
:
|
pred_to_Z <- function(back_transformed_data, |
|
params, |
|
dataset) { |
|
# TODO test: str_detect(response_variable_name) in param table |
|
match.arg(dataset, choices = c("blue tit", "eucalyptus"), several.ok = FALSE) |
|
if (rlang::is_na(params) | rlang::is_na(back_transformed_data)) { |
|
cli::cli_warn("Argument {.arg params} or {.arg back_transformed_data} is {.val {NA}}. Returning {.val {NA}} for standardized predictions.") |
|
return(NA) |
|
} |
|
|
|
if (dataset == "blue tit") { |
|
if (!pointblank::test_col_exists(back_transformed_data, |
|
columns = c("estimate", "se.fit") |
|
)) { |
|
cli::cli_warn("Blue tit Dataframe {.arg back_transformed_data} is missing columns {.val estimate} and/or {.val se.fit}. Returning {.val {NA}} for standardized predictions.") |
|
return(NA) |
|
} |
|
sd_p <- params %>% |
|
dplyr::filter(parameter == "sd") %>% |
|
purrr::pluck("value") |
|
mu_p <- params %>% |
|
dplyr::filter(parameter == "mean") %>% |
|
purrr::pluck("value") |
|
|
|
standardised_preds <- |
|
back_transformed_data %>% |
|
mutate( |
|
res = map2( |
|
estimate, |
|
se.fit, |
|
~ Z_VZ_preds( |
|
yi = .x, |
|
yi_se = .y, |
|
sd_p = sd_p, |
|
mu_p = mu_p |
|
) |
|
), |
|
.keep = c("unused") |
|
) %>% |
|
hoist(res, "Z", "VZ") %>% |
|
select(-starts_with("ci.")) |
|
} else { |
|
if (!pointblank::test_col_exists(back_transformed_data, |
|
columns = c("fit", "se.fit") |
|
)) { |
|
cli::cli_warn("Eucalyptus Dataframe {.arg back_transformed_data} is missing columns {.val fit} and/or {.val se.fit}. Returning {.val {NA}} for standardized predictions.") # TODO remove hard-coding, generalise |
|
return(NA) |
|
} |
|
sd_p <- params %>% |
|
filter(parameter == "sd") %>% |
|
pluck("value") |
|
mu_p <- params %>% |
|
filter(parameter == "mean") %>% |
|
pluck("value") |
|
|
|
standardised_preds <- |
|
back_transformed_data %>% |
|
mutate( |
|
res = map2(fit, se.fit, ~ Z_VZ_preds(.x, .y, sd_p, mu_p)), |
|
.keep = c("unused") |
|
) %>% |
|
hoist(res, "Z", "VZ") %>% |
|
select(-starts_with("ci.")) |
|
} |
|
|
|
return(standardised_preds) |
|
} |
Note that within pred_to_Z()
, the standardisation of the estimates is performed on back_transformed_data
, which contains data on the response_scale, not on the link-scale or otherwise. The function convert_predictions()
applies the relevant *_back()
transformation functions depending on what transformation is needed.
Note that the *_back
transformation functions take the out of sample estimate, and the standard error, generating a new distribution on the appropriate scale with the relevant _back()
transformation function. From this, we take the yi <- mean(distribution)
and the yi_est <- sd(distribution)
. For example:
|
cube_back <- function(beta, se, sim) { |
|
simulated <- rnorm(sim, beta, se) |
|
original <- pracma::nthroot(simulated, n = 3) %>% # inverse of x^3, use non-base to allow for -ve numbers |
|
na.omit() |
|
m_est <- mean(original) |
|
se_est <- sd(original) |
|
quantiles <- quantile(original, c(0.025, 0.975), na.rm = TRUE) |
|
set <- data.frame(mean_origin = m_est, se_origin = se_est, lower = quantiles[[1]], upper = quantiles[[2]]) |
|
if (flatten_dbl(set) %>% |
|
map_lgl(.f = ~ is.na(.x) | is.nan(.x) | is.infinite(.x)) %>% |
|
any()) { |
|
cli::cli_alert_danger("{.val NA}, {.val Inf} or {.val NaN} returned during back-transformation of effect sizes and standard errors.") |
|
} |
|
cli::cli_alert_success("Applied back-transformation for cubed effect sizes") |
|
return(set) |
|
} |
.
My concern is that actually we are parsing the standard deviation (computed with every _back()
function) to the function Z_VZ_preds()
when actually, we need the standard error to calculate the standardised SE estimate, which is a linear transformation VZ <- yi_se / sd_p
.