easystats/see

`p_function()` plot

strengejacke opened this issue · 19 comments

Like in https://journals.sagepub.com/doi/10.1177/02683962221105904

WDYT? @easystats/core-team

library(easystats)
#> # Attaching packages: easystats 0.5.2.13
#> ✔ bayestestR  0.13.0.1    ✔ correlation 0.8.3    
#> ✔ datawizard  0.6.3.3     ✔ effectsize  0.8.2    
#> ✔ insight     0.18.6.2    ✔ modelbased  0.8.5.1  
#> ✔ performance 0.10.0.11   ✔ parameters  0.19.0.14
#> ✔ report      0.5.5.3     ✔ see         0.7.3.2
library(ggplot2)

p_function_plot <- function(model, ci_method = NULL, param) {
  x <- suppressMessages(ci(model, ci_method = ci_method, ci = seq(0, 1, .01), verbose = FALSE))
  
  # data for plotting
  out <- data_filter(x, !is.infinite(CI_low) & !is.infinite(CI_high))
  out <- out[out$Parameter == param, ]
  out$CI <- round(out$CI, 2)

  # CI level emphasize
  line_size <- c(0.8, 0.8, 0.8, 0.9)

  # transform non-Gaussian
  info <- insight::model_info(model)
  if (info$is_linear) {
    delta <- 0
  } else {
    delta <- 1
    out$CI_low <- exp(out$CI_low)
    out$CI_high <- exp(out$CI_high)
  }

  # data for p_function ribbon
  data_ribbon <- data_to_long(out, select = c("CI_low", "CI_high"), values_to = "x")
  
  # data for vertical CI level lines
  data_ci_segments <- data_filter(out, CI %in% c(.25, .5, .75, .95))
  data_ci_segments$color <- "green"
  data_ci_segments$color[data_ci_segments$CI == 0.95] <- "red"
  
  # most plausible value (point estimate)
  point_estimate <- out$CI_low[which.min(out$CI)]

  ggplot() +
    geom_ribbon(
      data = data_ribbon,
      mapping = aes(x = x, ymin = 0, ymax = 1 - CI),
      color = "#1b6ca8",
      fill = "#1b6ca8",
      alpha = .1
    ) +
    geom_point(
      data = data_ci_segments,
      mapping = aes(x = CI_low, y = 1 - CI, colour = color)
    ) +
    geom_point(
      data = data_ci_segments,
      mapping = aes(x = CI_high, y = 1 - CI, colour = color)
    ) +
    geom_segment(
      data = data_ci_segments,
      mapping = aes(x = CI_low, y = 1 - CI, xend = CI_high, yend = 1 - CI, colour = color),
      size = line_size
    ) +
    geom_segment(
      mapping = aes(x = point_estimate, xend = point_estimate, y = 0, yend = 1),
      linetype = "dashed",
      color = "#cccccc"
    ) +
    annotate(
      geom = "text",
      x = point_estimate,
      y = 0.025,
      label = sprintf("%.3f", point_estimate)
    ) +
    scale_y_continuous(
      limits = c(0, 1),
      breaks = seq(0, 1, .05),
      sec.axis = sec_axis(
        trans = ~ 1 - .,
        name = "Compatibility Interval",
        breaks = seq(0, 1, .05),
        labels = sprintf("%g%%", round(100 * seq(0, 1, .05)))
      ),
      expand = c(0, 0)
    ) +
    scale_color_manual(
      values = c("#3aaf85", "#cd201f"),
      guide = "none"
    ) + 
    labs(y = "P value", x = "Range of Estimates", colour = NULL) +
    theme_lucid() +
    theme(panel.grid.minor.y = element_blank())
}

model <- glm(vs ~ mpg * cyl, data = mtcars, family = "binomial")
p_function_plot(model, ci_method = "wald", param = "mpg")

model <- lm(Sepal.Length ~ Species, data = iris)
p_function_plot(model, param = "Speciesversicolor")

Created on 2022-11-02 with reprex v2.0.2

The exp() is of course not needed for linear models

It's really good! I'd by default extend the plot to include the null hypothesis (e.g., 0) and have a vertical bar there (like in the paper) to show the position of the p-value reported by the model in regard to the other plausible values. It would give a sense of perspective

Neat! I like it. Agree with Dom. Would use lower-case italicized p, but that's just an APA convention.

#250 implements a basic plotting function, could be improved:

library(easystats)
#> # Attaching packages: easystats 0.5.2.13
#> ✔ bayestestR  0.13.0.1    ✔ correlation 0.8.3    
#> ✔ datawizard  0.6.3.3     ✔ effectsize  0.8.2    
#> ✔ insight     0.18.6.2    ✔ modelbased  0.8.5.1  
#> ✔ performance 0.10.0.11   ✔ parameters  0.19.0.15
#> ✔ report      0.5.5.3     ✔ see         0.7.3.4
model <- lm(Sepal.Length ~ Species + Sepal.Width + Petal.Length, data = iris)

plot(p_function(model))

plot(p_function(model, keep = "Petal.Length"))

plot(p_function(model, keep = "Petal.Length", ci_levels = c(.1, .3, .5, .9)))

Created on 2022-11-03 with reprex v2.0.2

would be cool to use facet_grid() rather than patchwork, or to have all in one plot, but I'm not sure how to add the Parameter values as additional grouping variable to ggplot.

Ok, I changed the implementation, we now have facets. This current implementation is a bit ugly, so there's a lot of room for improvements. Feel free to contribute to #250!

Requires easystats/parameters#808 to be installed.

library(parameters)
library(see)

model <- lm(Sepal.Length ~ Species + Sepal.Width + Petal.Length, data = iris)
p <- p_function(model)

plot(p)

plot(p, grid = TRUE)

Created on 2022-11-04 with reprex v2.0.2

@bwiernik @DominiqueMakowski @rempsyc ?

  • get rid of one of the two legends when we have no facets
  • add back one legend title when we have no facets
  • remove legend for facets
  • how to have two color scales when we have no facets? I.e. one color-scale for the ribbons, and one for the vertical bars?
  • drop intercept by default?
  • do we need to highlight the 95% CI at all?
  • include the null hypothesis (e.g., 0) and have a vertical bar there
  • italic lower case p
  • get rid of one of the two legends when we have no facets
  • add back one legend title when we have no facets
  • remove legend for facets
  • how to have two color scales when we have no facets? i.e. one color-scale for the ribbons, and one for the vertical bars?

I think rather than color, the lines for different confidence levels should be differentiated by line thickness. Make 25/50/75 thin lines and the focal hypothesis line (95) a thick line. Don't include a legend for either of these (with show.legend = FALSE).

  • drop intercept by default?

Yes

  • do we need to highlight the 95% CI at all?

Yes, I think including a focal hypothesis line makes this sort of plot a lot more accessible to people new to it. We can include a conf_level argument to let users customize which levels they want highlighted.

Current state. Dots are maybe a bit too large. Is dark grey OK for vertical CI lines?

library(parameters)
library(see)

model <- lm(Sepal.Length ~ Species + Sepal.Width + Petal.Length, data = iris)
p <- p_function(model)

plot(p, grid = FALSE)

plot(p, grid = TRUE)

plot(p, ci_emphasize = 0.75, grid = FALSE)

plot(p, ci_emphasize = 0.75, grid = TRUE)

p <- p_function(model, keep = "Petal.Length")

plot(p, grid = FALSE)

plot(p, grid = TRUE)

plot(p, ci_emphasize = 0.75, grid = FALSE, size_line = c(.4, 1.5))

plot(p, ci_emphasize = 0.75, grid = TRUE)

Created on 2022-11-04 with reprex v2.0.2

library(parameters)
library(see)

model <- lm(Sepal.Length ~ Species + Sepal.Width + Petal.Length, data = iris)
p <- p_function(model)

plot(p, show_values = FALSE)

plot(p, show_values = TRUE)

plot(p, grid = TRUE, show_values = FALSE)

plot(p, grid = TRUE, show_values = TRUE)

Created on 2022-11-04 with reprex v2.0.2

Given that we use "grid" to mean data grids elsewhere, maybe name that argument "facet"?

I think we have n_columns as argument to activate facet_wrap() or have an integrated plot. But I'll change grid to n_columns.

library(parameters)
library(see)

model <- lm(Sepal.Length ~ Species + Sepal.Width + Petal.Length, data = iris)
p <- p_function(model)

plot(p)

plot(p, show_labels = FALSE)

plot(p, n_columns = 1)

plot(p, n_columns = 2)

Created on 2022-11-05 with reprex v2.0.2

@bwiernik How would you describe the p value here? What would be a proper formulation? The point estimate has a p-value of 1, but of course we cannot say that with 100% prob. the point estimate is correct. Rather, I would say that "With a probability of p, the range of estimates most compatible with our data, given a correct model, is from [CI_low] to [CI_high]"? I think we need to add something like this to the docs of the parameters::p_function().

@strengejacke These look so good!