This intermediate applied econometrics course covers the theoretical, computational, and statistical underpinnings of big data analysis. The focus will be the econometric models and machine learning techniques to analyze the high-dimensional data sets a.k.a. “Big Data” and their implications in research focusing on interesting economic questions that arise from considering the rapid changes in data availability and computational technology. Big data econometric models provide a vehicle for modeling and analyzing complex phenomena and for incorporating rich sources of confounding information into economic models. The goal of this course is to give an applied, hands-on introduction to these methods. At the end of the course, students will be able to read and understand theoretical papers on the subject, implement the techniques themselves in Python, and apply the techniques to data used in economics and business. The data sets we will use for this course are from World Bank Group, Kaggle, Federal Reserve Economic Data, Google Finance, and several other resources.
Pre: ECON 3254 or ECON 4304 or CMDA 3654 or STAT 3006. (3H, 3C).
Syllabus:
Preliminaries
- Overview of Big Data and Big Data Visualization
- Python Programming (NumPy, SciPy, pandas, matplotlib, scikit-learn, PyTorch)
- Linear Algebra and Optimization for Machine Learning
- Regression Analysis; (Matrix Formulation, OLS, MLE, SGD, Logistic & Polynomial Regression)
- Curse of Dimensionality
- Bash Scripting and Shell Programming
- High-Performance Computing (VT ARC and Google CoLab)
Model Selection and Feature Extraction
- Regression with Many Regressors: Standard Approaches to Model Selection Algorithms
- Penalized Regression Methods: Lasso, Ridge, and Elastic Net
- Linear Dimensionality Reduction with an Emphasis on PCA
- Factor Models; Estimation and Inference
- Economic Forecasting in a Big Data Environment
- Estimation of Large Covariance and Precision Matrices
- Feature Selection from an Information-Theoretic Perspective
- A Brief Introduction to Bayesian Inference and Bayesian VARs
Deep learning in Big Data analytics
- Nonlinearity in Big Data Sets and Nonlinear Dimensionality Reduction
- Neural Networks and Deep Learning Autoencoders
- Double Machine Learning for Treatment and Causal Inference