UC Davis DataLab
Summer 2022
Instructors: Wesley Brooks, Nick Ulle
Maintainer: Nick Ulle <naulle@ucdavis.edu>
This 4-part workshop series provides an introduction to using the R programming language for reproducible data analysis and scientific computing. Topics include programming basics, how to work with tabular data, how to break down programming problems, and how to organize code for clarity and reproducibility.
After this workshop, learners will be able to load tabular data sets into R, compute simple summaries and visualizations, do common data-tidying tasks, write reusable functions, and identify where to go to learn more.
No prior programming experience is necessary. All learners will need access to an internet-connected computer and the latest version of Zoom, R, and RStudio.
- Reader
- Event Page
The course reader is a live webpage, hosted through GitHub, where you can enter curriculum content and post it to a public-facing site for learners.
To make alterations to the reader:
-
Run
git pull
, or if it's your first time contributing, see the Setup section of this document. -
Edit an existing chapter file or create a new one. Chapter files are R Markdown files (
.Rmd
) at the top level of the repo. Enter your text, code, and other information directly into the file. Make sure your file:- Follows the naming scheme
##_topic-of-chapter.Rmd
(the only exception isindex.Rmd
, which contains the reader's front page). - Begins with a first-level header (like
# This
). This will be the title of your chapter. Subsequent section headers should be second-level headers (like## This
) or below. - Uses caching for resource-intensive code (see the Resource-intensive Code section of this document).
Put any supporting resources in
data/
orimg/
. For large files, see the Large Files section of this document. You do not need to add resources generated by your R code (such as plots). The next step saves these indocs/
automatically. - Follows the naming scheme
-
Run the script
knit.R
to regenerate the HTML files in thedocs/
. You can do this in the shell with./knit.R
or in R withsource("knit.R")
. -
When you're finished,
git add
:- Any files you edited directly
- Any supporting media you added to
docs/
orimg/
- The entire
docs/
directory - The entire
_bookdown_files/
directory (contains the R Markdown cache) - The
.gitattributes
file (if you added a large file)
Then
git commit
andgit push
. The live web page will update automatically after 1-10 minutes.
If one of your code chunks takes a lot of time or memory to run, consider
caching the result, so the chunk won't run every time someone knits the
reader. To cache a code chunk, add cache=TRUE
in the chunk header. It's
best practice to label cached chunks, like so:
```{r YOUR_CHUNK_NAME, cache=TRUE}
# Your code...
```
Cached files are stored in the _bookdown_files/
directory. If you ever want
to clear the cache, you can delete this directory (or its subdirectories).
The cache will be rebuilt the next time you knit the reader.
Beware that caching doesn't work with some packages, especially packages that use external libraries. Because of this, it's best to leave caching off for code chunks that are not resource-intensive.
If you want to include a large file (say over 1 MB), you should use git LFS. You can register a large file with git LFS with the shell command:
git lfs track YOUR_FILE
This command updates the .gitattributes
file at the top level of the repo. To
make sure the change is saved, you also need to run:
git add .gitattributes
Now that your large is registered with git LFS, you can add, commit, and push the file with git the same way you would any other file, and git LFS will automatically intercede as needed.
GitHub provides 1 GB of storage and 1 GB of monthly bandwidth free per repo for large files. If your large file is more than 50 MB, check with the other contributors before adding it.
This repo uses Git Large File Storage (git LFS) for large files. If you don't have git LFS installed, download it and run the installer. Then in the shell (in any directory), run:
git lfs install
Then your one-time setup of git LFS is done. Next, clone this repo with git clone
. The large files will be downloaded automatically with the rest of the
repo.
This repo uses renv for package management. Install renv according to the installation instructions on their website.
Then open an R session at the top level of the repo and run:
renv::restore()
This will download and install the correct versions of all the required packages to renv's package library. This is separate from your global R package library and will not interfere with other versions of packages you have installed.