Data Science Inference and Modeling

HarvardX: PH125.4x | Data Science: Inference and Modeling

Abstract

This is the fourth in a series of courses in a Professional Certificate in Data Science program, a series of courses that prepare you to do data analysis in R, from simple computations to machine learning. Statistical inference and modeling are indispensable for analyzing data affected by chance, and thus essential for data scientists. In this course, you will learn these key concepts through a motivating case study on election forecasting.

This course will show you how inference and modeling can be applied to develop the statistical approaches that make polls an effective tool and we’ll show you how to do this using R. You will learn concepts necessary to define estimates and margins of errors and learn how you can use these to make predictions relatively well and also provide an estimate of the precision of your forecast.

Once you learn this you will be able to understand two concepts that are ubiquitous in data science: confidence intervals and p-values.

Finally, to understand statements about the probability of a candidate winning, you will learn about Bayesian modeling. At the end of the course, we will put it all together to recreate a simplified version of an election forecast model and apply it to the 2016 US presidential election.

The bookdown-version of this course is available on this Github Page