- Intro to
ggplot2
- Visualizing distributions
- Changing the appearance of plots
- Graphing distributions across groups
- Boxplots and violin plots
- Scatter plots
- Line plots
- Introduction to
dplyr
- Heatmaps
- Alluvial diagrams
- Primer on
tidyr
- Bar plots
- Advanced bar plots and
lubridate
- Coefficient plots
- Predictive probabilities
- Maps using
geom_polygon()
- Introduction to
sf
- Choropleth maps
- Stamen map server
- Plotting density on a map
You should have both R and RStudio installed on your machine.
In this workshop, we will be using R
together with the integrated development environment (IDE) RStudio. In addition to offering a "cleaner" programming development than the basic R
editor, RStudio offers a large number of added functionalities for integrating code into documents, built-in tools and web-development.
There are no formal prerequisites for this workshop. However, I am assuming that participants have a basic understanding of R
programming, in particular:
- Setting a working directory,
- Installing and loading packages,
- Reading and writing data,
- Basic data formats (scalar, vector, data frame),
- Basic variable types (numeric, character, factor, logical),
- Basic vector and data frame operations, such as subsetting, transforming variables, merging, reshaping, etc.
If you are unfamiliar with R
or would like to brush up on your skills, take a look at my intro to data management workshop. The first two sessions go over basic R
functionality and programming principles. The latter four sessions introduce data management operations using packages from the tidyverse
suite. I also recommend taking a look at R
for Data Science website and/or book for a great resource on learning R
and data management.
The key to learning R
is: Google! This workshop will give you an overview over data visualiztion in R
, but to become truly proficient you will have to actively use it yourself, trouble shoot, ask questions, and google! The R
mailing list and other help pages such as StackOverflow offer a rich archive of questions and answers by the R
community. For example, if you google "recode variable in r" you will find a variety of useful websites explaining how to do this on the first page of the search results. Also, don't be surprised if you find a variety of different ways to execute the same task.
RStudio has a useful help menu. In addition, you can get information on any function or integrated data set in R
through the console, for example:
??geom_tile
The teaching material is inspired by a course on Statistical Computing and Data Visualization by Abbass Sharif.
Packages used: dplyr, fivethirtyeight, gapminder, ggalluvial, ggmap, ggplot2, hflights, lubridate, maps, pscl, raster, readr, rmapshaper, sf, tidyr, tmaptools, viridis.
Additional data sources used:
- Armed Conflict Location & Event Data Project (ACLED), https://www.acleddata.com
- United States Environmental Protection Agency Air Data, https://www.epa.gov/outdoor-air-quality-data
Creator and instructor: Therese Anders (tanders@usc.edu)
This project is licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Feel free to use/adapt the teaching materials, but do not use them commercially/sell them, and share them under the same license.