Plotting normality of residuals: Scaling issues / differences
mutlusun opened this issue Β· 16 comments
Dear package authors,
Thank you very much for all your work and effort for the easystats packages. I like and use them a lot and they make my life definitely easier.
Recently, I stumbled upon a scaling "issue" with a data set consisting of two time points. Here is a small example for examination:
set.seed(1234)
x <- rnorm(100)
y <- rnorm(100, mean = x, sd = .5)
myd <- data.frame(x, y)
When fitting a regression model to this data and plotting the residuals, I get the impression that something is off here:
fit <- lm(y ~ x, data = myd)
tmp <- performance::check_normality(fit)
plot(tmp, type = "qq")
However, plotting in a different way doesn't show such a strong deviation:
plot(tmp, type = "qq", detrend = FALSE)
First of all, I'm not sure whether these differences in visualization are an "issue" of this package. When looking at the y-axis in the first plot, it is clear that the deviations don't have a strong magnitude. However, I still find it a bit strange that the same function suggests different interpretations.
Would it be an option to set certain minimal limits for the plot in the case of detrend = TRUE
? I'm happy to implement this but wanted to ask for your thoughts before.
Note: This is not such an issue with qqplotr
installed as the bands are often very wide and the interval wider.
The second plot shows a very strong deviation. Those tails are much wider than expected under a normal distribution. The first plot better shows the amount of the deviation by removing the angular axis.
This issue is related to easystats/performance#643 (comment).
When you look at the examples (https://easystats.github.io/see/articles/performance.html#qq-plot), you see that detrend = TRUE
often works well. But sometimes, due to the different limits of the y-axis, the plots look rather different (although they are not).
I still find it a bit strange that the same function suggests different interpretations.
It does not. It's just the scale that "zooms in" for detrend = TRUE
, and actually, for detrend = TRUE
, the deviation is usually easier to see.
That said, it may be surprising to users to see that large differences. Maybe we find a way to adjust the y-axis, so the differences are not that large.
We set the non-detrended y axis limits to fixed values if I recall correctly. We should do the same with appropriate values for the detrended y axis.
No, I think for both plots, we don't set limits, thus ggplot2 decides which range to use, based on the range of the data.
Line 214 in 1a090e1
The question is if it's possible to find reasonable y-limits for the detrended plot?
Thank you for your replies!
Sorry, for being sloppy in my description above. As both of you pointed out, the tails show a strong deviation and both plots show the same thing but the scaling is different. I am aware of that. I was, for example, more concerned with the range of -1 to 1 of the standard normal distribution quantiles which could be interpreted differently between both plots (when not paying attention to the limits of the y-axes).
The question is if it's possible to find reasonable y-limits for the detrended plot?
I have thought about that a while but I think it is hard to come up with reasonable y-limits. Is there a good rule of thumb for deciding when there is a deviation? (This way minimum y-limits could be set. But, I didn't find one and it is probably misleading to decide for one.)
As an alternative: Would it be possible to set qqplotr
as a hard dependency for this plot? This could be an easy way to increase the y-limits a bit.
Adding on this topic, I found these results interesting.
qqplot with detrend = TRUE

qqplot with detrend = FALSE

They provide completely opposite results π€―. This makes interpretation pretty troublesome.
Running shapiro.test()
on the residuals I obtain p-value = 0.8866
.
Do you have any idea on how to interpret these opposite results?
This is not just a toy example, I initially obtained these results from a real study. I am not sure if these strange results are related to the characteristics of the study: small fixed effect (Marginal R2 ~ 12%), but very large random effect (conditional R2 ~ 90%), so the remaining residual variance is really small.
Below is a minimal working example to simulate data and reproduce the plots.
Click me - Snippet Code
library('tidyverse')
library('ggplot2')
library('lme4')
set.seed(2024)
# create data
df <- expand_grid(
id = 1:20,
trial = 1:20,
condition = 0:1
) %>%
mutate(
across(c(id, trial, condition), factor)
)
contrasts(df$condition) <- contr.sum(2)
# generate random effect
ran_ef <- rnorm(20, mean = 0, sd =.45)
# get model matrix random effects
my_model <- "y ~ condition + (1 | id)"
df$y <- 1 # lFormula need y
foo <- lFormula(eval(my_model), df)
Z <- t(as.matrix(foo$reTrms$Zt))
# get model matrix fixed effects
V <- model.matrix(~ condition, data = df)
# simulate y
df$y <- as.vector(V %*% c(1.4, .15) + Z %*% ran_ef + rnorm(nrow(df), mean = 0, sd = .15))
# plot data
ggplot(df) +
geom_point(aes(x = id, y = y, color = condition),
alpha = .6, position = position_jitter(width = .2))
# fit model
my_fit <- lmer(y ~ condition + (1 | id), data = df)
performance::check_model(my_fit)
performance::r2(my_fit)
performance::check_normality(my_fit)
shapiro.test(residuals(my_fit))
qqnorm(residuals(my_fit))
tmp <- performance::check_normality(my_fit)
plot(tmp, type = "qq")
plot(tmp, type = "qq", detrend = FALSE)
By the way, thank you for the amazing packageβ€οΈβ€οΈ
It seems like qqplotr
is not installed? I get completely different plots:
library(easystats)
#> # Attaching packages: easystats 0.7.1.2
#> β bayestestR 0.13.2.1 β correlation 0.8.4.2
#> β datawizard 0.10.0.4 β effectsize 0.8.8
#> β insight 0.19.11.2 β modelbased 0.8.7
#> β performance 0.11.0.9 β parameters 0.21.7.1
#> β report 0.5.8.2 β see 0.8.4.1
library(lme4)
#> Loading required package: Matrix
set.seed(2024)
# create data
df <- expand.grid(
id = 1:20,
trial = 1:20,
condition = 0:1
) |>
to_factor()
contrasts(df$condition) <- contr.sum(2)
# generate random effect
ran_ef <- rnorm(20, mean = 0, sd = .45)
# get model matrix random effects
my_model <- "y ~ condition + (1 | id)"
df$y <- 1 # lFormula need y
foo <- lFormula(eval(my_model), df)
Z <- t(as.matrix(foo$reTrms$Zt))
# get model matrix fixed effects
V <- model.matrix(~condition, data = df)
# simulate y
df$y <- as.vector(V %*% c(1.4, .15) + Z %*% ran_ef + rnorm(nrow(df), mean = 0, sd = .15))
# fit model
my_fit <- lmer(y ~ condition + (1 | id), data = df)
tmp <- performance::check_normality(my_fit)
plot(tmp, type = "qq")
plot(tmp, type = "qq", detrend = FALSE)
Created on 2024-05-22 with reprex v2.1.0
Yes, thank you for the clarification!
After installing qqplotr
I obtain more reasonable (normalπ
) results
Before I obtained the message
For confidence bands, please install 'qqplotr'.
I thought it was only an "additional" feature, not a requirement.
Thank you for the immediate help!
I thought it was only an "additional" feature, not a requirement.
Yes, usually, it should indeed only be for CIs. But qqplotr has a detrend-argument that does the transformation of the plot, and w/o that package, we have written our own detrend-functions, which obviously doesn't work well in all situations.
@mattansb any ideas?
The CIs are an additional feature enabled by qqplotr, but they really do make possible the interpretation of the the detrended QQ plot (and the non-detrended plot as well!)
I meant the own detrend-code:
Lines 281 to 304 in 7dc72a6
Any ideas, why the above plot (#335 (comment)) deviates that much when qqplotr is not installed?
Hmmmmm.... there's some scaling issue here. I'll look into it.
This isn't 100%, but it's very close.........
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.3.3
g <- rnorm(20, sd = 15)
old
ggplot() +
stat_qq(aes(sample = g,
y = after_stat(sample - theoretical),
x = after_stat(theoretical)))
fix?
# Re scale?
N <- length(g)
SD <- sd(g) * sqrt((N - 1) / N)
ggplot() +
stat_qq(aes(sample = g,
y = after_stat(sample - theoretical * SD),
x = after_stat(theoretical * SD)))
similiar to qqplotr
ggplot() +
qqplotr::stat_qq_point(aes(sample = g), detrend = TRUE)
Created on 2024-05-22 with reprex v2.1.0
Thanks, I'll revise the code in see.