/Flight_Arrival_Prediction

The data is of 2008 for Flight arrival times for Denver airport. We will use this data to predict arrival times for planes in order to help a potential passenger figure out which flight to book so that they can avoid getting delayed

Primary LanguageJupyter Notebook

Predicting Arrival Times for Denver airport using Random Forests and Naive Bayes

Correlation matrix of independent varaible with dependent variable matrix

Models Tried:

  • Random Forest Regressor
  • Random Forest Classifier
  • Naive Bayes Gaussian Classifier

Random Forest Models:

Significance of the variables in predicting the Arrival Delay as identified by the Random Forest model:

Plot

Hyper-Parameter Tuning: The following are the Random Forest Parameters that we care going to experiment with to find the best parameters.

  • n_estimators = number of trees in the forest
  • max_features = max number of features considered for splitting a node
  • max_depth = max number of levels in each decision tree
  • min_samples_split = min number of data points placed in a node before the node is split
  • min_samples_leaf = min number of data points allowed in a leaf node
  • bootstrap = method for sampling data points (with or without replacement)

Best Parameters:

  • n_estimators : 400
  • min_samples_split : 10
  • min_samples_leaf : 1
  • max_features : 'auto'
  • max_depth : 20
  • bootstrap : True
  • ArrDelay categories for Random Forest Classifier Model:

Classification bin labels

  • Less than -30 minutes : Very Early
  • Less than 0 minutes & Greater than or equal to -30 minutes : Early
  • 0 minutes to 5 minutes : Ontime
  • 5 minutes to 30 minutes : Late
  • Greater than 30 minutes : Very late.

Conclusion:

The accuracy of our Random Forest Classification is 81% while our Naive Bayes Classification is only 70%. We were able to generate the best Random Forest Classifier by hyper-parameter tuning. However we did could not do hyper-parameter tuning on the Naive Bayes classifier as it is so naive that it doesn't accept parameters except priors which we don't know.

Reference: