Reproducible analysis of bigger, naturally-occurring datasets using R, Rmarkdown, and the tidyverse

Abstract

The availability of data on the web has opened up many resources for cognitive scientists who know how to deal with "medium data" - big enough to crash excel but small enough to load into memory. R is a powerful tool for statistical data analysis and reproducible research, and the "tidverse" - an ecosystem of R packages for manipulating, analyzing, and visualizing data - provides many tools for manipulating this kind of data quickly and easily. In this tutorial, I'll walk through how to go from a database or tabular data file to an interactive plot with surprisingly little pain (and less code than you'd imagine). My focus will be on introducing a workflow that uses a wide variety of different tools and packages, including readr, dplyr, tidyr, and shiny. I'll assume basic familiarity with R and will use (but not spend too much time teaching) ggplot2. Featuring data from http://wordbank.stanford.edu

Instructor: Michael C. Frank (Stanford University)

Michael C. Frank is Associate Professor of Psychology at Stanford University. He earned his BS from Stanford University in Symbolic Systems in 2005 and his PhD from MIT in Brain and Cognitive Sciences in 2010. He studies both adults' language use and children's language learning and how both of these interact with social cognition. His work uses behavioral experiments, computational tools, and novel measurement methods including large-scale web-based studies, eye-tracking, and head-mounted cameras. He is recipient of the FABBS Early Career Impact award, his dissertation received the Glushko Prize from the Cognitive Science Society, and he has been recognized as a "rising star" by the Association for Psychological Science. For more see: http://langcog.stanford.edu.