Build status
Version on CRAN
The goal of checkpoint
is to solve the problem of package reproducibility in R. Specifically, checkpoint
solve the problems that occur when you don't have the correct versions of R packages. Since packages get updated on CRAN all the time, it can be difficult to recreate an environment where all your packages are consistent with some earlier state.
To solve this, checkpoint
allows you to install packages from a specific snapshot date. In other words, checkpoint
makes it possible to install package versions from a specific date in the past, as if you had a CRAN time machine.
With the checkpoint
package, you can easily:
- Write R scripts or projects using package versions from a specific point in time;
- Write R scripts that use older versions of packages, or packages that are no longer available on CRAN;
- Install packages (or package versions) visible only to a specific project, without affecting other R projects or R users on the same system;
- Manage multiple projects that use different package versions;
- Share R scripts with others that will automatically install the appropriate package versions;
- Write and share code R whose results can be reproduced, even if new (and possibly incompatible) package versions are released later.
Using checkpoint
is simple:
- The
checkpoint
package has only a single function,checkpoint()
where you specify the snapshot date. - Example:
checkpoint("2015-01-15")
instructs R to install and use only package versions that existed on January 15, 2015.
To write R code for reproducibility, simply begin your master R script as follows:
library(checkpoint)
checkpoint("2015-01-15") ## or any date in YYYY-MM-DD format after 2014-09-17
Choose a snapshot date that includes the package versions you need for your script (or today's date, to get the latest versions). Any package version published since September 17, 2014 is available for use.
Sharing your R analysis reproducibly can be as easy as emailing a single R script. Begin your script with the following commands:
- Load the
checkpoint
package usinglibrary(checkpoint)
- Ensure you specify
checkpoint()
with your checkpoint date, e.g.checkpoint("2014-10-01")
Then send this script to your collaborators. When they run this script on their machine, checkpoint
will perform the same steps of installing the necessary packages, creating the checkpoint
snapshot folder and producing the same results.
When you create a checkpoint, the checkpoint()
function performs the following:
- Creates a snapshot folder to install packages. This library folder is located at
~/.checkpoint
- Scans your project folder for all packages used. Specifically, it searches for all instances of
library()
andrequire()
in your code. - Installs these packages from the MRAN snapshot into your snapshot folder using
install.packages()
- Sets options for your CRAN mirror to point to a MRAN snapshot, i.e. modify
options(repos)
This means the remainder of your script will run with the packages from a specific date.
To achieve reproducibility, once a day we create a complete snapshot of CRAN, on the "Managed R archived network" (MRAN) server. At midnight (UTC) MRAN mirrors all of CRAN and saves a snapshot. (MRAN has been storing daily snapshots since September 17, 2014.) This allows you to install packages from a snapshot date, thus "going back in time" to this date, by installing packages as they were at that snapshot date.
Together, the checkpoint
package and the MRAN server act as a CRAN time machine. The checkpoint()
function installs the packages to a local library exactly as they were at the specified point in time. Only those packages are available to your session, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint()
can ensure the reproducibility of your scripts or projects at any time.
To revert to your default CRAN mirror and access globally-installed packages, simply restart your R session.
# Create temporary project and set working directory
example_project <- paste0("~/checkpoint_example_project_", Sys.Date())
dir.create(example_project, recursive = TRUE)
oldwd <- setwd(example_project)
# Write dummy code file to project
cat("library(MASS)", "library(foreach)",
sep="\n",
file="checkpoint_example_code.R")
# Create a checkpoint by specifying a snapshot date
library(checkpoint)
checkpoint("2014-10-01")
# Check that CRAN mirror is set to MRAN snapshot
getOption("repos")
# Check that library path is set to ~/.checkpoint
.libPaths()
# Check which packages are installed in checkpoint library
installed.packages()
# cleanup
unlink(example_project, recursive = TRUE)
setwd(oldwd)
To install checkpoint
directly from CRAN, use:
install.packages("checkpoint")
library("checkpoint")
To install checkpoint
directly from github, use the devtools
package. In your R session, try:
install.packages("devtools")
devtools::install_github("RevolutionAnalytics/checkpoint")
library("checkpoint")
Although checkpoint
will scan for dependencies in .Rmd
files if knitr
is installed, it does not automatically install the knitr
or rmarkdown
packages.
To build your .Rmd
files, you will have to add a script in your project that explicitly loads all the packages required to build your .Rmd
files.
A line like the following may be sufficient:
library(rmarkdown)
This should automatically resolve dependencies on the packages knitr
, yaml
and htmltools
To build your rmarkdown
file, use a call to rmarkdown::render()
. For example, to build a file called example.Rmd
, use:
rmarkdown::render("example.Rmd")
https://github.com/RevolutionAnalytics/checkpoint/wiki
Post an issue on the Issue tracker at https://github.com/RevolutionAnalytics/checkpoint/issues
https://github.com/RevolutionAnalytics/checkpoint-server
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.