Email: jdwilson4@usfca.edu
Time Line: Wednesday, August 22nd - Wednesday, October 11th
Class Time: M, W: 10:00 - 11:50 AM; 1:15 - 3:05 PM in Howard Room 527
Office Hours: M, W: 3:30 - 4:30 PM in Howard 5th floor Agora
Grader: Anshika Srivastava (asrivastava3@dons.usfca.edu)
- Applied Linear Regression Models- 4th Edition by Kutner, Nachtsheim, and Neter (Required)
- Introduction to Statistical Learning (online)
- Elements of Statistical Learning (online)
- Linear Models with R by Julian Faraway
- Statistical Inference by Casella and Berger
By the end of this course, you will be able to
- Formulate and apply classical simple and multiple linear regression models
- Formulate and test hypotheses and use models for both prediction and explanation
- Use R to load and manipulate data, fit regression models, and generate various outputs like ANOVA tables, confidence intervals for parameters, and diagnostic assessments
- Verify/test whether or not fitted residuals conform to the assumptions that underlie classical regression
- Identify and manage outliers and influential observations
- Assess and address multicollinearity, heteroscedasticity, autocorrelation, non-normality, model misspecification
- Communicate the results of complete and well-reasoned regression analysis
The focus of this course will be to provide you with the basic mathematical and computational techniques available for making informed, data-driven decisions using regression models. We will implement the models using the R programming language. We will discuss the following topics
- Distributional Theory: the Normal, t, Chi-Squared, and F distributions
- Statistical Inference: estimation, hypothesis tests, and confidence intervals
- Simple and Multiple Linear Regression
- Model Building and Variable Selection
- Outlier detection
- Model Diagnostics: outliers, multicollinearity, non-normality, autocorrelation
- Analysis of Variance (ANOVA)
- Logistic Regression
- Shrinkage Methods: the Lasso and Ridge Regression
The focus of this course will be to provide you with the basic techniques available for making informed, data-driven decisions using the R programming language. This is not a statistics course, but will provide you the intuition to make hypotheses about complex questions through visualization, wrangling, manipulation, and exploration of data. The course will be graded based on the following components:
- Assignments (30%): You will be assigned computational and theoretical homework assignments to be completed and turned in on Canvas
- Quizzes (20%): Each week you will be given a short quiz that tests the main lessons taught in class from the previous week. These are given on Mondays at 9:00 AM
- Final Exam (30%): The final exam will be a comprehensive exam covering the main components of Regression analysis
- Final Project (20%): The final project will be a computational case study that brings together the techniques learned throughout the semester. The description for this project will be provided towards the mid point of the semester.
- Homework 1. Due Thursday, September 7th at 9:00 AM on Canvas
- Homework 2. Due Thursday, September 21st at 9:00 AM on Canvas
- Homework 3. Due Wednesday, October 11th at 5:00 PM on Canvas
- Case Study. Due Friday, 10/6 at 9:00 AM. [Data]; [Data Description]; [Morty House]
Introduction and Motivation
Topic | Reading | Practice | In-Class Code |
---|---|---|---|
Intro and A Brief History of Data Science | Ch. 1 of Doing Data Science | Read this | |
Overview of Machine Learning | Ch. 1 of ISL | ||
Model Building from the Statistical Learning Perspective | Ch. 2 of ISL |
Model Fitting and Inference
Topic | Reading | Practice | In-Class Code |
---|---|---|---|
Simple Linear Regression: Model and Estimation | Ch. 3 of ISL | Ch 3.6.1 - 3.6.3 ISL | Intro to Regression in R |
Basics of Statistical Inference | |||
Tests, Confidence Intervals, and Prediction Intervals | Ch 2 and 3 of Linear Models with R | ||
Shrinkage Methods - Ridge and Lasso | Ch. 6 of ISL | Penalized Regression in R |
Model Diagnostics
Generalized Linear Models
Topic | Reading | Practice | In-Class Code |
---|---|---|---|
Classification and Logistic Regression | Ch. 4.3 of ISL |
- Base R Cheat Sheet
- Try R by CodeSchool Quick, interactive R coding lessons
- Swirl (skip step 1 and 2 if you have already installed R and R Studio) Interactive coding lessions in R Studio
- Wednesday, August 23rd - First day of class
- Monday, September 4th - Labor Day Holiday, no class
- Wednesday, October 11th - Last day of class