Problems using alternative tests (i.e. anova) PART-II
gundalav opened this issue · 2 comments
I have the following data
library(tidyverse)
library(ggsignif)
set.seed(1)
a <- rnorm(n = 10, mean = 5, sd = 1)
b <- rnorm(n = 10, mean = 5.8, sd = 1)
data <- data.frame(label = c(rep("A", 10), rep("B", 10)), id = c(1:10, 1:10),
value = c(a, b))
And I perform the one-way anova pot using this code:
a <- aov( value ~ label, data)
summary(a)
Which produces:
Df Sum Sq Mean Sq F value Pr(>F)
label 1 4.201 4.201 4.793 0.042 *
Residuals 18 15.779 0.877
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Notice the p-value is 0.042.
However when I tried with ggsignif with this code:
ggplot(data, aes(x = label, y = value )) +
geom_boxplot() +
geom_signif(comparison = list(c("A", "B")), y_position = 11, test = function(a, b) {
list(p.value = summary(aov(a ~ b))[[1]][["Pr(>F)"]][[1]])
})
I got this:
Note the p-value stated there is 0.28 instead of 0.042.
What's the right way to do it, so that I can put the 0.042 value inside the plot?
Hi gundalav,
thank you for the detailed and easy to reproduce question.
The problem is that arguments a
and b
in the test=function(a, b){...}
work slightly different than you assume. The easiest way to see this is to add browser()
call inside the function like this (which is actually a super-powerful general way to debug problems):
ggplot(data, aes(x = label, y = value )) +
geom_boxplot() +
geom_signif(comparison = list(c("A", "B")), y_position = 11, test = function(a, b) {
browser()
list(p.value = summary(aov(a ~ b))[[1]][["Pr(>F)"]][[1]])
})
To get the p-value that you expect, you would have to do the following:
ggplot(data, aes(x = label, y = value )) +
geom_boxplot() +
geom_signif(comparison = list(c("A", "B")), y_position = 11, test = function(a, b) {
tmp_label <- c(rep("A", length(a)), rep("B", length(b)))
tmp_values <- c(a, b)
list(p.value = summary(aov(tmp_values ~ tmp_label))[[1]][["Pr(>F)"]][[1]])
})
Best, Constantin
Thank you. That does it.