Base R plot support

Question

Base R plot support

kylebutts opened this issue a year ago · 10 comments

@grantmcdermott I'm wondering if I could pick your brain about base R plot support (or more likely, using plot2). I've added support for fixest_multi usage (multiple y variables and/or split/fsplit samples). Now you can replace feols with fwlplot and it will just work :-)

I have the following dataset ready to plot from previous calls (see bottom for example of what the dataset would look like). If there is no split/fsplit, then sample is a character column of empty strings "".

For plotting, if there is both I do facet_grid(rows = var, cols = sample) if the estimation is split and multiple y variables. If there is only 1 of split or multiple y, then I use facet_wrap. Otherwise, what we have.

I saw your most recent comment here: grantmcdermott/tinyplot#76, and was wondering if this is something that might be easily possible in plot2?

         sample   idx    var       y_resid       x_resid           fit           lwr           upr
         <char> <int> <char>         <num>         <num>         <num>         <num>         <num>
 7:     cyl = 4     9   disp  2.291529e+01  4.627560e-01  2.286022e+01  1.229056e+01  3.342988e+01
 8:     cyl = 4     8   disp  5.531900e+01  1.039647e+00  5.135873e+01  3.308987e+01  6.962759e+01
 9:     cyl = 6     1   disp  8.526513e-12 -1.275000e-01  8.540724e-12  8.465837e-12  8.615610e-12
10:     cyl = 6    10   disp  8.554935e-12  2.042810e-14  8.540724e-12  8.497488e-12  8.583959e-12
11:     cyl = 6    11   disp  8.554935e-12  2.042810e-14  8.540724e-12  8.497488e-12  8.583959e-12
12:     cyl = 6     2   disp  8.526513e-12  1.275000e-01  8.540724e-12  8.465837e-12  8.615610e-12
13: Full sample     1   disp  1.202142e+01 -3.645965e-01 -1.538589e+01 -2.786976e+01 -2.902013e+00
23: Full sample     9   disp  1.108624e+01  4.346205e-01  1.834088e+01  4.748985e+00  3.193278e+01
24: Full sample     8   disp  5.716882e+01  1.066898e+00  4.502284e+01  1.904455e+01  7.100112e+01
25:     cyl = 4    19    mpg -8.817797e-01 -3.726589e-01  1.349912e+00 -1.675924e+00  4.375747e+00
31:     cyl = 4     9    mpg -6.758828e-01  4.627560e-01 -1.676277e+00 -5.002769e+00  1.650214e+00
32:     cyl = 4     8    mpg -5.066455e+00  1.039647e+00 -3.765994e+00 -9.515585e+00  1.983596e+00
33:     cyl = 6    10    mpg  7.000000e-01  2.042810e-14  1.477903e-12 -8.894343e+00  8.894343e+00
34:     cyl = 6    11    mpg -7.000000e-01  2.042810e-14  1.477903e-12 -8.894343e+00  8.894343e+00
35:     cyl = 6     2    mpg  8.668621e-13  1.275000e-01  8.668021e-13 -1.257850e+01  1.257850e+01
36: Full sample     3    mpg -1.091027e+00 -3.594840e-01  1.343803e+00 -8.147531e-01  3.502358e+00
37: Full sample    20    mpg  4.979079e+00 -3.419456e-01  1.283776e+00 -8.296313e-01  3.397184e+00
         sample   idx    var       y_resid       x_resid           fit           lwr           upr

Answer 1 · 2023-10-12T17:42:29.000Z

Hi Kyle.

The tl;dr read is that, yes, facets are on the roadmap for plot2. I haven't written down the formal implementation, but I believe it should work following the basic format that I outlined in the comment you linked too. TBH I hadn't considered facet_grid support yet and was mostly thinking i.t.o. facet_wrap... But I guess both should be possible.

One UI thing I'd like feedback on is potential formula method support and implementation. My current thinking is that we could denote facets after a second vertical bar, e.g. y ~ x | colour_var | facet_var. You could expand to grids via a two-part formula, y ~ x | colour_var | grid_var1 ~ grid_var2.

I can't make any promises RE timeline. But my goal is to finish this type of feature support in time for a CRAN submission before 2023 is out. I may go ahead and merge the current PR since that's the current bottleneck for further development rn.

Answer 2 · 2023-10-12T20:19:28.000Z

OTOH if you're in a rush, you may only need plot2 if you are worried about a nice legend (potentially irrelevant here?). Quick mockup below using your dataset. I'm still using plot2 here, but only for the ribbon part since I'm lazy I don't feel like going the manual polygon route.

dat = data.table::fread('~/Desktop/kyle.csv')
sdat = split(dat, list(dat$var, dat$sample))

par(mfcol = c(length(unique(dat$var)), length(unique(dat$sample))))

invisible(lapply(
    seq_along(sdat),
    \(i) {
        # plot points
        with(
            sdat[[i]],
            plot(
                x = x_resid, y = y_resid,
                pch = 16,
                xlim = range(dat$x_resid), ylim = range(c(dat$upr, dat$lwr)),
                frame.plot = FALSE, axes = FALSE, ann = FALSE
            )
        )
        # add grid
        grid()
        # facet details for some label calcs
        facets = par("mfcol") 
        n_facets = Reduce(`*`, par("mfcol"))
        # x axis
        if (i %% facets[1] == 0) {
            axis(1, lty = 0)
            title(xlab = "Residualized x")
        }
        # y axis
        if (i %in% 1:facets[1]) {
            axis(2, lty = 0, las = 1)
            title(ylab = "Residualized y")
        }
        # 1st facet labels
        if (i %% facets[1] == 1) {
            mtext(sub(".*\\.", "", names(sdat))[i], side = 3)
        }
        # 2 facet labels
        if (i %in% c(n_facets, n_facets-1)) {
            mtext(sub("\\..*", "", names(sdat))[i], side = 4, crt = 180)
        }
        
        # add best fit ribbon (here, using plot2)
        with(
            sdat[[i]], 
            plot2::plot2(
                x = x_resid, y = fit, ymin = lwr, ymax = upr, 
                type = "ribbon", col = "darkgreen",
                add = TRUE
                )
        )
    }
))

^{Created on 2023-10-12 with reprex v2.0.2}

Answer 3 · 2023-10-13T15:56:54.000Z

This is really great Grant! I didn't know about par(mfcol). Really should dive into base R plotting at some point...

Answer 4 · 2023-10-13T20:04:55.000Z

Completed by 71eb918

Answer 5 · 2024-01-26T23:21:40.000Z

Just a HU that plot2 v0.0.4 (just released on R-universe) includes full-featured facet support. One nice thing compared to your current bespoke solution is that it correctly spaces and scales the axes titles relative to the individual plot windows. So you don't end up with the left-most plots being "thinner" as a result of axes printing.

library(plot2)

# resids dataset obtained by running debugonce(fwlplot:::plot_resids_base_r)
# and writing to disk
resids = read.csv('~/Desktop/resids.csv')

resids |>
  with(plot2(
    x = x_resid, y = y_resid,
    facet = sample ~ var,
    facet.args = list(bg = "grey90"),
    pch  = 21, fill = adjustcolor(1, alpha = 0.1),
    frame = FALSE, grid = TRUE
  ))
resids |>
  with(plot2(
    x = x_resid, y = fit, ymin = lwr, ymax = upr,
    type = "ribbon", col = "dodgerblue",
    facet = sample ~ var,
    add = TRUE
  ))

^{Created on 2024-01-26 with reprex v2.0.2}

(Obvs looks a bit silly here b/c of reprex's squashed plot aspect ratio. But you can see that things at least look proportional.)

We could probably port that logic here, or I can flag once plot2 hits CRAN.

Answer 6 · 2024-01-27T00:54:12.000Z

Thanks for the heads up. This is great. Since there's going to be basically 0 more features added on this, I'll port over to plot2's feature and then push to CRAN when you do.

FYI, with reprex you can use # %%/#' since it uses knitr::spin under the hood. e.g.

library(fixest)
library(plot2)

aq = airquality
aq$Month = factor(month.abb[aq$Month], levels = month.abb[5:9])

mod = lm(Temp ~ 0 + Month / Day, data = aq)
aq = cbind(aq, predict(mod, interval = "confidence"))

# %% fig.width = 8, fig.height = 6, dpi = 300
# Plot the original points 
with(
  aq,
  plot2(
    x = Day, y = Temp,
    by = Month,
    facet = "by", facet.args = list(bg = "grey90"),
    palette = "dark2",
    grid = TRUE, frame = FALSE, ylim = c(50, 100),
    main = "Actual and predicted air temperatures"
  )
)

yields

library(fixest)
library(plot2)
#> Warning: package 'plot2' was built under R version 4.3.2

aq = airquality
aq$Month = factor(month.abb[aq$Month], levels = month.abb[5:9])

mod = lm(Temp ~ 0 + Month / Day, data = aq)
aq = cbind(aq, predict(mod, interval = "confidence"))

# Plot the original points 
with(
  aq,
  plot2(
    x = Day, y = Temp,
    by = Month,
    facet = "by", facet.args = list(bg = "grey90"),
    palette = "dark2",
    grid = TRUE, frame = FALSE, ylim = c(50, 100),
    main = "Actual and predicted air temperatures"
  )
)
# Add the model predictions to the same plot 
with(
  aq,
  plot2(
    x = Day, y = fit,
    ymin = lwr, ymax = upr,
    by = Month, facet = "by",
    type = "ribbon",
    palette = "dark2",
    add = TRUE
  )
)

^{Created on 2024-01-26 with reprex v2.1.0}

Answer 7 · 2024-02-17T21:48:35.000Z

Made the switch! I'll try to keep an eye out when tinyplot gets put on CRAN @grantmcdermott.