/BSDS100

This is the course website for BSDS 100: "Introduction to Data Science with R" at the University of San Francisco. Assignments, lecture notes, and open source code will all be available on this website.

Primary LanguageR

BSDS 100: Intro to Data Science with R

Abbie M. Popa

Email: apopa@usfca.edu

Class Time: TR, 2:40 - 4:25 PM in Harney 430

Office Hours: TR, 1:20 - 2:20 PM in Harney 107B (James Wilson's Office)

Book: R for Data Science by Hadley Wickham and Garret Grolemund

Syllabus: Link

Course Learning Outcomes

By the end of this course, you will be able to

  • Proficiently wrangle, manipulate, and explore data using the R programming language
  • Use contemporary R libraries including ggplot2, tibble, tidyr, dplyr, knitr, and stringr
  • Visualize, present, and communicate trends in a variety of data types
  • Communicate results using R markdown and R Shiny
  • Formulate data-driven hypotheses using exploratory data analysis and introductory model building techniques

Course Overview

Assessment

The focus of this course will be to provide you with the basic techniques available for making informed, data-driven decisions using the R programming language. This is not a statistics course, but will provide you the intuition to make hypotheses about complex questions through visualization, wrangling, manipulation, and exploration of data. The course will be graded based on the following components:

  • Attendence (20%): Attendance will be recorded and you will lose points for every class you miss.
  • Assignments (40%): You will be assigned a computational assignment to be completed using RStudio and the package knitr regularly throughout class.
  • Case Studies (20%): You will be assigned applied case studies throughout the class that are to be completed using RStudio.
  • Final Project (20%): The final project will be a computational case study that brings together the techniques learned throughout the semester. The description for this project will be provided towards the mid point of the semester.

Schedule

I will do my best to keep this schedule accurate and up to date. However, I reserve the right to change it as I deem necessary. Usually this will be due to the amount of material we are able to cover in class.

If you wish to view the notes I use during lecture you can see them here, though note I often change these based on class questions.

Introduction

Topic Reading Assignment Due Date In Class Code
Introduction - History of Data Science Ch. 1 What is Data Science? HW 1 Thursday, 8/23 Installing R, RStudio, and LaTeX
R and RStudio HW 2 Tuesday, 8/28 In Class Code 2018-08-23
R Packages and RMarkdown HW 3 Tuesday, 9/4 In Class Activity
In Class Activity Solution: Rmd Code
In Class Activity Solution: PDF Output
Class Code - Packages
Class Code - R File to PDF
Class Output - R File to PDF
Class Code - Rmd File to PDF
Class Output - Rmd File to PDF
Class Activity 2

Data Structures in R

Topic Reading Assignment Due Date In Class Code
Vectors, Matrices, and Arrays HW 4 Tuesday, 9/11 Class Code Aug 30, 2018
Class Code Sept 4, 2018
Coding Challenge
Coding Challenge Answer Key
Real World Examples
Class Code Sept 6, 2018
Lists and Data Frames Ch. 20 in R for Data Science Class Code Sept 11, 2018
Coding Challenge
Tibbles Ch. 10 in R for Data Science HW 5 Tuesday, 9/25 Tibbles versus Data Frames Activity
Class Code Sept 13, 2018
Lecture Qs Sept 13, 2018
Class Code Sept 18, 2018
Tibbles versus Data Frames Activity Answer Key
Strings and Factors Ch. 14.1 - 14.2 and 15 in R for Data Science Class Code 180920
Class Code 180925

Ethics in Data Science

Topic Reading Assignment Due Date In Class Code
Ethics in Data Science

Data Wrangling and Plotting

Topic Reading Assignment Due Date In Class Code
Input and Output HW 6 Thursday, 10/18 Factor and String Lab - Rmd
Factor and String Lab - PDF
Tree Data
Question Data
Class Code
singles data
triples data
Plotting in R Plotting Lab as .Rmd
Plotting Lab as .PDF
Class Code 181009
Class Code 181011
Class Code 181018
Wrangling Data with tidyr Ch. 12 in R for Data Science Class Code 181025
Wrangling Data with dplyr - I Class Code 181030
Wrangling Lab - Rmd
Wrangling Lab - PDF
Wrangling Relational Data with dplyr HW 7 Tuesday, 11/13 Join Lab - Rmd
Join Lab - PDF
Class Code 181106
String Analysis Ch. 14.3 - 14.7 in R for Data Science Class Code 181108

Programming

Topic Reading Assignment Due Date In Class Code
Control Flow Ch. 21 in R for Data Science HW 8 Tuesday 11/27 Class Code 181113
Writing Functions Ch. 19 in R for Data Science Function Lab - Rmd
Function Lab - PDF
Class Code 181127
Class Code 181129

Other

Topic Reading Assignment Due Date In Class Code
Extra Review

DS in the Wild

Example
Song Lyrics

Case Studies

Case Study Data In-Class Date Due Date Notes
CS 1 Ramen Reviews September 25th, 2018 October 9th, 2018 Case Study 1 Notes
CS 2 hour data
day data
October 23, 2018 November 8, 2018

Final Project

Description Due Date Notes
Project Sign-Up will be through a google doc link on Canvas November 1st at 9 AM
Final Project Description - UPDATED due to smoke December 7 at 11:59 PM Final Tips and Tricks
Report Tips
Presentation Tips

Important Dates

  • Monday, August 27th - Last day to add the class
  • Friday, September 7th - Census date. Last day to withdraw with tuition reversal
  • Tuesday, October 16th - Fall break! (no class)
  • Friday, November 2nd - Last day to withdraw
  • Thursday, November 22nd - Thanksgiving Holiday (no class)
  • Thursday, December 7 - Final Projects Due
  • Tuesday, December 4th - Last day of class