/nuit_tidyverse

Tidyverse workshop for Northwestern Data Science and Programming Workshops Summer 2019

Primary LanguageR

Data Manipulation with the Tidyverse

Tidyverse workshop for Northwestern Data Science and Programming Workshops Fall 2019

Instructor: Katie Evans

What is the Tidyverse?

  • Collection of packages for data manipulation, exploration, and visualization that share a common syntax
  • Intended to make data scientists more productive by guiding them through workflows
  • Allows for connections between tools

Topics to cover

  • dplyr: The dplyr package is the most useful package in R for data manipulation. One of the greatest advantages of the package is that you can use the pipe function (%>%) to combine different functions.
  • tidyr: The tidyr package complements dplyr perfectly. It boosts the power of dplyr for data manipulation and pre-processing.

Other Tidyverse packages to check out:

  • readr: The readr package is used to import and export data as tibbles in R.
  • stringr: The stringr package is used for strings. It provides a cohesive set of functions designed to make working with strings as easy as possible.
  • ggplot2: Data scientists universally love using ggplot2 to produce their charts and visualizations!
  • lubridate: The lubridate package is the best way to deal with dates and times in R! From converting strings to dates to calculating hours between two time points.
  • purrr: The purrr package in R provides a complete toolkit for enhancing R’s functional programming. We can use the functions provide by purrr to avoid many loops with just one line of code.
  • forecats: The forecats package is dedicated to dealing with categorical variables or factors.
  • broom: The broom package takes the messy output of built-in functions in R and turns them into tidy dataframes

Tidyverse Resources