tidyverse/design

Handling large number of revdep failures

hadley opened this issue · 2 comments

Very occassionally we need to do package releases which cause a large number (potential hundreds) of packages to break. This should only ever be done with a major version bump, when we expect the impact on data analysis code to be minimal. and the long-term payoff to be extremely high. It seems strange for an update to break much package code but little data analysis code, but this can often arise where changes in subtle edges cases cause unit tests to fail. Obviously, any large number of failures needs to be investigated carefully, and this is not something we should do routinely, but it is occassionally the right thing to do (and we believe better than incrementing some number in the package name). This document describes our recommended practices.

See also #114, since we are more obliged to help others when its our fault.

Team effort

Handling large number of revdep failures has to be a team effort (both internally and with members of the R community). I think it's easiest to manage that effort with a shared googlesheet that records failing packages, who's working on them, and any resolution.

dplyr 1.0.0
ggplot2 3.0.0

The main sheet records:

  • package name
  • current status (see below)
  • link to package home on github
  • link to issue summary in revdep/problems.md
  • number of downloads in last month (used to roughly prioritise triage)
  • who's working on it
  • rough notes on root cause
  • link to fix - either PR, commit by author (if fixed in devel but not CRAN, ...)

This sheet is for editing by humans, it's also accompanied by two sheets generated with code:

These sheets are plumbed into the main sheet using a VLOOKUP so they can be easily updated when they change.

This strategy is useful once you get above ~20 failures since it gets to be too hard to hold all in your head (or in checklist bullets in the release issue). 20 failures is not uncommon for packages with thousands of revdeps.