CGU School of Social Science, Policy & Evaluation CGU
Department of Economic Sciences
Causal Modeling, Big Data and Machine Learning
Fall 2023

Course Instructor: Greg DeAngelo

E-mail: gregory.deangelo@cgu.edu
Course Instructor: Scott Cunningham

E-mail: scunning@gmail.com

Course Instructor: Minjae Yun

E-mail: minjae.yun@cgu.edu

Teaching Assistant: Anuar Assamidanov

E-mail: anuar.assamidanov@cgu.edu

Semester start/end dates: 8/28/2023 – 12/16/2023
Meeting day, time: Tuesday, 10:00 AM - 11:50 AM PST
Course Location: Online

Course Description

This course will cover statistical methods based on the machine learning literature that can be used for causal inference. In economics and the social sciences more broadly, empirical analyses typically estimate the effects of counterfactual policies, such as the effect of implementing a government policy, changing a price, showing advertisements, or introducing new products. Recent advances in supervised and unsupervised machine learning provide systematic approaches to model selection and prediction, methods that are particularly well suited to datasets with many observations and/or many covariates.

Background Preparations (Prerequisites)

Econometrics, probability and statistics, basic programming

Student Learning Outcomes

By the end of this course, students will be able to:

  1. Secure the system and reproducibility of data analysis through programming
  2. Implement machine learning algorithms
  3. Develop a causal identification strategy
  4. Identify the basic assumptions of causal inference as applied to machine learning

Texts and Journal References

  • Required: James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. "An Introduction to Statistical Learning with Applications in Python." New York: Springer, 2023. (Free PDF: https://www.statlearning.com)
  • Optional: Matheus Facure. "Causal Inference in Python: Applying Causal Inference in the Tech Industry." 1st Edition. O'Reilly Media, 2023.
  • Optional: Sutton, Richard S., and Andrew G. Barto. "Reinforcement Learning: An Introduction." Second Edition. MIT Press, Cambridge, MA, 2018. (Free PDF: http://incompleteideas.net/book/the-book-2nd.html)
  • Modules

    For each week, a set of required problem sets are assigned. Supplementary readings are also provided for those who wish to delve deeper.

    1. Introduction to Causal Inference and Machine Learning
    2. Data Collection 1: Working with APIs
    3. Machine Learning Fundamentals for Estimating Treatment Effects
    4. Python Programming for Estimating Treatment Effect
    5. Estimating Heterogenous Treatment Effect
    6. Double/Debiased Machine Learning (DML)*
    7. Introduction to Causal Forests*
    8. Multi-armed Bandits and Causal Decision Making*
    9. Instrumental Variable Lasso (IV Lasso)*
    10. Synthetic Difference-in-Differences (Diff-in-Diffs)
    11. Data Collection 2. Web Scraping
    12. Automating Process and Data Visualization
    13. Introduction to Unsupervised Learning
    14. Matrix Completion Techniques for "Missing" Data
    *Weeks marked with an asterisk (*) are subject to potential changes based on the course's evolving curriculum.

    Week 1. Introduction to Causal Inference and Machine Learning

    Econometrics recap and the gist of statistical learning and supervised/unsupervised machine learning

  • Reading: Athey, Susan and Guido Imbens (2019) Machine Learning Methods That Economists Should Know About
  • Chapter 6 from An introduction to statistical learning with applications in Python
  • News article Data labeling in supervised learning
  • Lecture Note
  • Week 2. Data Collection 1: Working with APIs

    Manage covariates from US Census, UCR, Twitter, Reddit, and else

  • Chapter 7 from An introduction to statistical learning with applications in Python
  • Basic Programming Lecture Note
  • US Census
  • FBI Crime data
  • Reddit
  • Python package for Reddit
  • Twitter
  • Python package for Twitter
  • Jacob Kaplan's Reservoir
  • Week 3. Machine Learning Fundamentals for Estimating Treatment Effects

    The promise of machine learning in estimating treatment effects

  • Lecture Notes from Dr. Brigham Frandsen's workshop
  • Chapter 8 from An introduction to statistical learning with applications in Python
  • Week 4. Python Programming for Estimating Treatment Effect

  • Lecture Notes from Dr. Brigham Frandsen's workshop
  • Chapter 10 from An introduction to statistical learning with applications in Python
  • Week 5. Estimating Heterogenous Treatment Effect

  • Lecture Notes from Dr. Brigham Frandsen's workshop
  • Reading: Athey, Susan, and Guido Imbens (2016) Reading Recursive Partitioning for Heterogeneous Causal Effects
  • Reading: Chernozhukov, Victor, Mert Demirer, Esther Duflo, and Iván Fernández-Val (2020) Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with Application to Immunization in India

    Week 6. Double/Debiased Machine Learning (DML)

    Lecture by Dr. Scott Cunningham

  • Chapter 22 from Causal Inference for The Brave and True
  • Optional Reading: Chapter 4 from Causal Inference in Python
  • Week 7. Introduction to Causal Forests

    Lecture by Dr. Scott Cunningham

  • Reading: Athey, Susan, and Guido Imbens (2016) The Econometrics of Randomized Experiments
  • Week 8. Multi-armed Bandits and Causal Decision Making

    Lecture by Dr. Scott Cunningham

  • Optional Reading: Chapter 2 from Reinforcement Learning: An Introduction.
  • Week 9. Instrumental Variable Lasso (IV Lasso)

    Lecture by Dr. Scott Cunningham

  • Reading: Belloni, Alexandre, Victor Chernozhukov, Christian Hansen (2011) LASSO Methods for Gaussian Instrumental Variables Models
  • Week 10. Synthetic Difference-in-Differences (Diff-in-Diffs)

  • Reading: Arkhangelsky, Dmitry, Susan Athey, David A. Hirshberg, Guido Imbens, and Stefan Wager (2021) Synthetic Difference in Differences
  • Python package: pysynthdid
  • Data: Castle doctrine
  • Week 11. Data Collection 2. Web Scraping

    Collecting various information from cyberspace including news articles and create a flat data file

  • Lecture Note
  • Week 12. Automating Process and Data Visualization

    For reproducibility and systematic management of data analysis

    Week 13. Introduction to Unsupervised Learning

  • Chapter 12 from An introduction to statistical learning with applications in Python
  • Ludwig, Jens and Mullainathan, Sendhil, Algorithmic Behavioral Science: Machine Learning as a Tool for Scientific Discovery (July 15, 2022). Chicago Booth Research Paper No. 22-15, Available at SSRN: https://ssrn.com/abstract=4164272 or http://dx.doi.org/10.2139/ssrn.4164272
  • Week 14. Matrix Completion Techniques for "Missing" Data

  • Reading: Athey, Susan, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, and Khashayar Khosravi (2021) Matrix Completion Methods for Causal Panel Data Models
  • Paper example
  • Python Coding
  • R Package: gsynth and MCPanel
  • Data: California smoking dat
  • Grading

    Your grade will be calculated using the following scale. Grades with plus or minus designations are at the professor’s discretion.

    Letter Grade Grade Point Description Learning Outcome
    A 4.0 Complete mastery of course material and additional insight beyond course material (Overall grade percent ≥ 90) Insightful
    B 3.0 Complete mastery of course material (90 > Overall grade ≥ 80) Proficient
    C 2.0 Caps in mastery of course material; not at level expected by the program (80 > Overall grade ≥ 65) Developing
    U 0.0 Unsatisfactory (65 > Overall grade Ineffective

    If I learn of any potential violation of our gender-based misconduct policy (rape, sexual assault, dating violence, domestic violence, or stalking) by any means, I am required to notify the CGU Title IX Coordinator at Deanof.Students@cgu.edu or (909) 607-9448. Students can request confidentiality from the institution, which I will communicate to the Title IX Coordinator. If students want to speak with someone confidentially, the following resources are available on and off campus: EmPOWER Center (909) 607-2689, Monsour Counseling and Psychological Services (909) 621-8202, and The Chaplains of the Claremont Colleges (909)621-8685. Speaking with a confidential resource does not preclude students from making a formal report to the Title IX Coordinator if and when they are ready. Confidential resources can walk students through all of their reporting options. They can also provide students with information and assistance in accessing academic, medical, and other support services they may need.