/DS-2006

This repository contains the course materials for DS 2005 (Computational Probability).

Primary LanguageRMIT LicenseMIT

Computational Probability, Spring 2024

Overview

This course is all about variation, uncertainty, and randomness. Students will learn the vocabulary of uncertainty and the mathematical and computational tools to understand and describe it.

Instructor

Thomas Stewart
Elson Building, 400 Brandon Ave, Room 156
thomas.stewart@virginia.edu
Github: thomasgstewart

Teaching assistants

Ethan Nelson
Graduate student in Data Science
ean8fr@virginia.edu
Github: eanelson01

Instruction & Office hours

Format of the class: In-class time will be a combination of lectures, group assignments, live coding, and student presentations. Please note: Circumstances may require the face-to-face portion of the class to be online.

Time: MWF, 10 - 10:50am, Dell 1 Room 105

Instructor Office Hours: MW, 11am, Dell 1 Commons (The instructor will leave if there are no questions after 15 minutes.)

TA Office Hours: Thursdays, 1pm, Dell 1 Commons

Textbooks

The following textbooks are freely available online via the UVA library.

Understanding uncertainty by Dennis V. Lindley

Understanding Probability, 3rd edition
by Henk Tijms

Introduction to Probability: Models and Applications
by N. Balakrishnan, Markos V. Koutras, Konstadinos G. Politis

The following textbooks may also be helpful.

Probability and Statistics for Data Science
by Norman Matloff

Introduction to Probability Models
by Sheldon M. Ross

Computing

The course will be taught using R.

Big ideas & Learning Outcomes

The following are the four ideas that I hope will persist with students after the minutia of the Poisson distribution has faded from memory. Expand each section to see the associated learning outcomes and topics.

Probability is a framework for organizing beliefs; it is not a statement of what your beliefs should be.
Learning outcomes Topics
compare and contrast different definitions of probability, illustrating differences with simple examples
  • long-run proportion
  • personal beliefs
  • combination of beliefs and data
express the rules of probability verbally, mathematically, and computationally
  • AND, OR, complement, total probability
  • simulation error (relative and absolute)
illustrate the rules of probability with examples
using long-run proportion definition of probability, derive the univariate rules of probability
organize/express bivariate random variables in cross tables
define joint, conditional, and marginal probabilities
identify joint, conditional, and marginal probabilities in cross tables
identify when a research question calls for a joint, conditional, or marginal probability
describe the connection between conditional probabilities and prediction
derive Bayes rule from cross tables
apply Bayes rules to answer research questions
determine if joint outcomes are independent
calculate a measure of association between joint outcomes
apply cross table framework to the special case of binary outcomes
  • Sensitivity
  • Specificity
  • Positive predictive value
  • Negative predictive value
  • Prevalence
  • Incidence
define/describe confounding variables
  • Simpson's paradox
  • DAGs
  • causal pathway
list approaches for avoiding confounding
  • stratification
  • randomization
Probability models are a powerful framework for describing and simplifying real world phenomena as a means of answering research questions.
Learning outcomes Topics
list various data types
match each data type with probability models that may describe it
  • Bernoulli
  • binomial
  • negative binomial
  • Poisson
  • Gaussian
  • gamma
  • mixture
discuss the degree to which models describe the underlying data
tease apart model fit and model utility
express probability models both mathematically, computationally, and graphically
  • PMF/PDF
  • CMF/CDF
  • quantile function
  • histogram/eCDF
employ probability models (computationally and analytically) to answer research questions
explain and implement different approaches for fitting probability models from data
  • Tuning
  • Method of Moments
  • Maximum likelihood
  • Bayesian posterior
  • kernel density estimation
visualize the uncertainty inherent in fitting probability models from data
  • sampling distribution
  • posterior distribution
  • bootstrap distribution
explore how to communicate uncertainty when constructing models and answering research questions
  • confidence intervals
  • support intervals
  • credible intervals
  • bootstrap intervals
propagate uncertainty in simulations
explore the trade-offs of model complexity and generalizability
Probability is a framework for coherently updating beliefs based on new information and data.
Learning outcomes Topics
select prior distributions which reflect personal belief
  • informative vs weakly informative priors
implement bayesian updating
manipulate the posterior distribution to answer research questions
Probability models can be expressed and applied mathematically and computationally.
Learning outcomes Topics
use probability models to build simulations of complex real world processes to answer research questions

Grading

Courses carrying a Data Science subject area use the following grading system: A, A-; B+, B, B-; C+, C, C-; D+, D, D-; F. The symbol W is used when a student officially drops a course before its completion or if the student withdraws from an academic program of the University.

Grading Scale:

  • 93-100 A
  • 90-92 A-
  • 87-89 B+
  • 83-86 B
  • 80-82 B-
  • 77-79 C+
  • 73-76 C
  • 70-72 C-
  • <70 F

Grades will be a weighted average of the final exam score (30%), the midterm exams (each 15%), the deliverables (20%) and homeworks (20%).

Individual homeworks are graded with a score of 0, 1, or 2. After the initial grading, students may resubmit homework within one week of feedback for an additional point. That is, an initial score of 1 can be bumped up to a 2. Likewise, a 0 can be bumped up to a 1.

Deliverables are larger assignments than homework. To complete the deliverables, you will use probability models to build simulations of complex real world processes to answer questions. Deliverables are graded like homeworks, including the opportunity to resubmit for an additional point.

Midterm exams are graded on a 100 point scale. For midterm 1, if your grade on midterm 2 or the final is higher, the higher score will replace the score for midterm 1. Likewise, for midterm 2, if your grade on the final exam is higher, the higher score will replace the score for midterm 2. For example, suppose your exams scores for the midterms and final were 72, 88, 85. For the purposes of the final grade, your exam scores would be 88, 88, 85.

The final exam is Thursday, May 9 at 9:00am, as assigned by the university. Approximately one week prior to the exam, the instructor will provide a set of questions for which students will prepare solutions and written explanations. During the final exam period, the instructor will provide a supplementary set of questions related first. For example, the instructor may ask:

  • Please explain how you solved a particular question in the initial set.
  • Please solve a new question (perhaps closely related to a question in the initial set).
  • Please explain course topic X.

Students will be graded on both the accuracy of their responses and the clarity with which they explain course concepts and solutions to questions.

2024 Calendar

Homeworks, deliverables, reading assignments, and exams will be posted on the course calendar below.

Mon Tue Wed Thu Fri
Jan
 
17
Survey/Github setup
19
ReadingGet started guide
Intro R
22
ReadingIntro Markdown
Markdown Cheatsheet
24
Tools
Reproducable Reports
26
DUE: HW1
Reading(optional) First 5 videos of Learn R Programming
(optional) Intro to VS Code
(optional) Using Git with Visual Studio Code Note that you have already cloned your repo locally, whereas the video creates a fresh repo.
29
DUE: HW2
Rstudio on Rivanna
31
 
Feb
 
2
DUE: HW3
ReadingUnderstanding uncertainty, CH 1
5
DUE: HW4
7
DUE: HW5
DUE: HW1 Resubmission
Operating Characteristics
9
DUE: HW6
DUE: HW2 Resubmission
12
DUE: HW7
DUE: HW3 Resubmission
Rules of prob 1
Rules of prob 2
14
Exam review
Prep questions
DUE: HW8
16
ExamYou will be given a set of prep questions on Feb 14. Generate solutions to the prep questions prior to the in-class exam. During the exam, you will be given a test questions similar to the prep questions. You will be able to copy and paste and tweak your solutions to the prep questions to solve the exam questions.
DUE: HW4 Resubmission
19
Read/Watch Deliverable 1
DUE: HW5 Resubmission
21
Work on Deliverable 1
23
DUE Deliverable 1
HW6 Resubmission
26
 
28
DUE: HW9
Mar
DUE: HW10
Diagnostics
4
Spring break
6
Spring break
8
Spring break
11
In class: Deliverable 2
13
 
14
DUE: Deliverable 2
15
 
18
Data types
DUE: HW11
DUE: HW 7 Resubmission
20
DUE: HW 8 Resubmission
22
HW 12 
25
HW 13
27
Exam review
Prep questions
29
ExamYou will be given a set of prep questions on Mar 27. Generate solutions to the prep questions prior to the in-class exam. During the exam, you will be given a test questions similar to the prep questions. You will be able to copy and paste and tweak your solutions to the prep questions to solve the exam questions.
Apr
In class code (Prob tom)
Bernoulli (Binomial)
Hands/Sequences
3
 
5
Bernoulli sequences
8
DUE: HW 12 Resubmission
10
DUE: Deliverable 1 Resubmission
 
12
DUE: HW 14 
15
DUE: HW 11 Resubmission
17
 
19
DUE: HW 15 
22
DUE: Deliverable 2 Resubmission
24
 
26
KDE
KDE part 2 
29
Last class
Exam review
May
DUE: HW 13 Resubmission
DUE: HW 14 Resubmission
3
 
6
 
8
 
9
Final exam
9:00am - 12:00pm

Adjustments

The instructor may alter the course content and grading policies during the semester.

Collaborative learning

Students are encouraged to study together. The instructions for each assignment/deliverable will indicate if and how students may work together. Students should not collaborate on midterm or final exams. Students that violate the collaborative-work policy on an assignment, deliverable, or exam will receive a score of 0 on the assignment, deliverable, or exam. Students may be referred to UVA Honor Committee.

University of Virginia Honor System. All work should be pledged in the spirit of the Honor System at the University of Virginia. The following pledge should be written out at the end of all quizzes, examinations, individual assignments, and papers: “I pledge that I have neither given nor received help on this examination (quiz, assignment, etc.)”. The pledge must be signed by the student. For more information, visit www.virginia.edu/honor.

Accommodations

UVA is committed to creating a learning environment that meets the needs of its diverse student body. If you anticipate or experience any barriers to learning in this course, please feel welcome to discuss your concerns with me. If you have a disability, or think you may have a disability, you may also want to meet with the Student Disability Access Center (SDAC), to request an official accommodation. You can find more information about SDAC, including how to apply online, through their website at www.studenthealth.virginia.edu/SDAC. If you have already been approved for accommodations through SDAC, please make sure to send me your accommodation letter and meet with me so we can develop an implementation plan together.