Problems using alternative tests (i.e. anova) PART-II

Question

Problems using alternative tests (i.e. anova) PART-II

gundalav opened this issue 5 years ago · 2 comments

I have the following data

library(tidyverse)
library(ggsignif)
set.seed(1)
a <- rnorm(n = 10, mean = 5, sd = 1)
b <- rnorm(n = 10, mean = 5.8, sd = 1)

data <- data.frame(label = c(rep("A", 10), rep("B", 10)), id = c(1:10, 1:10), 
                   value = c(a, b))

And I perform the one-way anova pot using this code:

a <- aov( value ~ label, data)
summary(a)

Which produces:

            Df Sum Sq Mean Sq F value Pr(>F)  
label        1  4.201   4.201   4.793  0.042 *
Residuals   18 15.779   0.877                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Notice the p-value is 0.042.

However when I tried with ggsignif with this code:

ggplot(data, aes(x = label, y = value )) + 
  geom_boxplot() + 
  geom_signif(comparison = list(c("A",  "B")), y_position = 11, test = function(a, b) {
    list(p.value = summary(aov(a ~ b))[[1]][["Pr(>F)"]][[1]])
  })

I got this:

Note the p-value stated there is 0.28 instead of 0.042.
What's the right way to do it, so that I can put the 0.042 value inside the plot?

Answer 1 · 2019-12-10T09:07:39.000Z

Hi gundalav,

thank you for the detailed and easy to reproduce question.

The problem is that arguments a and b in the test=function(a, b){...} work slightly different than you assume. The easiest way to see this is to add browser() call inside the function like this (which is actually a super-powerful general way to debug problems):

ggplot(data, aes(x = label, y = value )) + 
  geom_boxplot() + 
  geom_signif(comparison = list(c("A",  "B")), y_position = 11, test = function(a, b) {
    browser()
    list(p.value = summary(aov(a ~ b))[[1]][["Pr(>F)"]][[1]])
  })

To get the p-value that you expect, you would have to do the following:

ggplot(data, aes(x = label, y = value )) + 
  geom_boxplot() + 
  geom_signif(comparison = list(c("A",  "B")), y_position = 11, test = function(a, b) {
    tmp_label <- c(rep("A", length(a)), rep("B", length(b)))
    tmp_values <- c(a, b)
    list(p.value = summary(aov(tmp_values ~ tmp_label))[[1]][["Pr(>F)"]][[1]])
  })

Best, Constantin

Answer 2 · 2019-12-18T00:34:09.000Z

Thank you. That does it.