invert OR option
consultfiv opened this issue · 5 comments
Describe the bug
This it is not a bug itself, it is just a recommendation for an improvement in the emmeans function, in order to make improved console's outputs.
I tried and achieved to get an improve output in the console that I think it provides easier and faster interpretations of the results.
Expected behaviour
This R script (requires packages "emmeans" and "dplyr") creates a function that performs several key steps for analyzing and transforming the results of a generalized linear model (GLM) using the emmeans package. Here’s a breakdown of the process:
After the emmeans function calculates the estimated marginal means for the Feature1 levels, and pairwise comparisons between these levels are performed, the results are converted to a tibble and numeric values are rounded to three decimal places for clarity.
For odds ratios less than 1, their inverse is calculated, along with inverses for the corresponding confidence intervals (LCL and UCL). The contrasts are also reversed. The dataframe is updated to include the transformed odds ratios and confidence intervals, and variables are renamed to reflect these changes. The dataframe is ordered by the odds ratio in descending order to highlight the most significant effects.
This script ensures that the analysis correctly handles and interprets odds ratios and their confidence intervals, especially when dealing with ratios less than 1, and organizes the results for easy interpretation.
To reproduce
(To facilitate its use, I converted it directly to a function. I also deleted comments, if you want them I can also provide comments for each action)
library(emmeans)
library(dplyr)
invert_OR <- function(emmeans_obj) {
emmeans_df <- as_tibble(emmeans_obj)
emmeans_df$contrast <- as.character(emmeans_df$contrast)
emmeans_df <- emmeans_df %>%
dplyr::mutate(
new_odds.ratio = ifelse(odds.ratio < 1, 1 / odds.ratio, odds.ratio),
new_asymp.LCL = ifelse(odds.ratio < 1, 1 / asymp.UCL, asymp.LCL),
new_asymp.UCL = ifelse(odds.ratio < 1, 1 / asymp.LCL, asymp.UCL),
new_contrast = ifelse(odds.ratio < 1,
sapply(strsplit(as.character(contrast), " / "),
function(x) paste(rev(x), collapse = " / ")),
contrast)
) %>%
dplyr::select(new_contrast, new_odds.ratio, SE, df, new_asymp.LCL, new_asymp.UCL, null, z.ratio, p.value) %>%
dplyr::rename(
contrast = new_contrast,
odds.ratio = new_odds.ratio,
asymp.LCL = new_asymp.LCL,
asymp.UCL = new_asymp.UCL
)
emmeans_df$contrast <- factor(emmeans_df$contrast, levels = unique(emmeans_df$contrast))
emmeans_df <- emmeans_df %>%
arrange(desc(odds.ratio))
options(scipen = 999)
emmeans_df[] <- lapply(emmeans_df, function(x) {
if (is.numeric(x)) {
round(x, 3)
} else {
x
}
})
return(emmeans_df)
}
example:
set.seed(135)
n <- 100
Feature1 <- factor(sample(c("A", "B", "C", "D"), n, replace = TRUE))
Class <- factor(sample(c("Class1", "Class2"), n, replace = TRUE))
dataset <- data.frame(
Feature1 = Feature1,
Class = Class
)
glm.1 <- glm(Class ~ ., data = dataset, family = family.glm)
emmeans1 <- emmeans(glm.1, pairwise ~ Feature1, type = "response") %>%
pairs(reverse = F, infer = T)
invert_OR(emmeans1)
Show the actual output
# A tibble: 6 × 9
contrast odds.ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 D / C 2.42 0.239 Inf 0.548 10.8 1 -1.53 0.42
2 D / B 2.22 0.264 Inf 0.494 10 1 -1.36 0.522
3 D / A 1.57 0.338 Inf 0.400 6.21 1 -0.853 0.829
4 A / C 1.54 0.894 Inf 0.345 6.85 1 0.741 0.881
5 A / B 1.41 0.829 Inf 0.311 6.39 1 0.585 0.937
6 B / C 1.09 0.687 Inf 0.216 5.50 1 0.138 0.999
Additional context
As you can see, with the functionality of invert_OR( ), which it would be desireble to be as an aditional option included in the original function, every relationship is given in odds ratio bigger than 1, facilitating the interpretation of the relationship between each term of each pair.
Emmeans regular output look like this:
> emmeans1
contrast odds.ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
A / B 1.410 0.829 Inf 0.3114 6.39 1 0.585 0.9367
A / C 1.538 0.894 Inf 0.3455 6.85 1 0.741 0.8805
A / D 0.635 0.338 Inf 0.1612 2.50 1 -0.853 0.8292
B / C 1.091 0.687 Inf 0.2164 5.50 1 0.138 0.9991
B / D 0.450 0.264 Inf 0.1000 2.03 1 -1.364 0.5223
C / D 0.412 0.239 Inf 0.0932 1.83 1 -1.529 0.4199
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
Intervals are back-transformed from the log odds ratio scale
P value adjustment: tukey method for comparing a family of 4 estimates
Tests are performed on the log odds ratio scale
Thanks for your suggestion. I went through and edited your comment. adding Markdown markup for code, so you can see the indentation and alignment. I also removed the "bug" tag, since this isn't a bug.
I guess this illustrates that different people have different preferences in how to present statistical results. But I personally do not regard this as an improvement at all. The fact that in the original contrast
output, some odds ratios are greater than 1 and others are less than 1 is important information that is immediately apparent, and I think reversing the direction of some ratios adds confusion.
In addition, the original output contains annotations that inform us about P-value adjustments, etc. that are not present in the tibble version. These annotations are important enough to me that I have a special startup message warning people not to suppress them (see ? untidy
). I also am not myself a fan of tibbles; yes, data frames are idiosyncratic, and tibbles "solve" some of them; but they do so by creating other idiosyncrasies and add advertising ("a tibble"), unneeded row numbers, and distracting information about storage types. If we are going to add extra lines to the output, I'd rather show information that is actually useful. I realize that is just my opinion.
There are other packages that re-package emmeans's output in various ways to suit the preferences of particular user communities. I do recognize that different users have different needs, but I think the invert_OR()
function would better be incorporated in a separate package, rather than in emmeans.
By the way, your specification pairwise ~ Feature1
is unneeded; you'll get the same results using ~ Feature1
while avoiding computing the pairwise comparisons twice.
Also, a simple way to obtain the same ordering without reversing any odds ratios is to sort by p.value
Thank you for your attention and for your responses. I understand all you said. Anyway, congrats for building such a great package which I have been enjoying and recommending since the day I discovered it.
I'm closing this issue, as it is resolved.