Part of SMU's Master's of Data Science Program
- Megan Ball
- Amber Clark
- Matt Farrow
- Blake Freeman
The data set that our group selected came the U.S. Department of Transportation's Bureau of Transportation Statistics and "tracks the on-time performance of domestic flights operated by large air carriers" during 2015 (Kaggle). The data is broken up into three separate .csv files: one with flight details for 5,819,079 flights during 2015, one with 14 different U.S.-based airlines, and one with the geographic details of 322 U.S. airports.
The data is important because it takes airline delays and cancellations, one of the aspects of airline travel that is popular fodder for public complaints, and quantifies it in a way that offers the possiblity of tangible analysis. For the purposes of this analysis, the two variables that we intend to measure are ARRIVAL_DELAY and CANCELLATION.
- Lab 1: Visualization & Data Preprocessing (Instructions, Jupyter Notebook)
- Mini-Lab: Logistic Regression & SVM (Instructions, Jupyter Notebook)
- Lab 2: Classification (Instructions, Jupyter Notebook)
- Lab 3: Clustering, Association Rules, or Recommenders (Instructions)