We used the Hotel Booking Demand dataset found on Kaggle to predict whether the reservation would be canceled. This is a robust dataset containing 119,390 data points. The dataset contains daily booking data from 2015 to 2017 for both city and resort hotels located around the world. There are 32 variables contained in the data, including when the booking was made, reserved room type, arrival day, length of stay (week and weekend nights), if the customer is a repeat guest, lead time (number of days between booking date and arrival date), and if a cancellation occurred (previous and current bookings). There are many questions one could use this data to answer. We decided to focus on reservation cancellations, as this has a substantial impact on hotel revenue and profitability.
-
Hotel Booking Cancellation Prediction Report: https://github.com/shihyuanwang/Hotel_Booking_Cancellation_Prediction/blob/main/Hotel%20Booking%20Cancellation%20Prediction.pdf
-
R Code:
*Association Rules, KNN, and Random Forests: https://github.com/shihyuanwang/Hotel_Booking_Cancellation_Prediction/blob/main/Hotel%20Booking%20Cancellation_Association_KNN_RF.Rmd
*Logistic Regression and Classification Trees: https://github.com/shihyuanwang/Hotel_Booking_Cancellation_Prediction/blob/main/Hotel%20Booking%20Cancellation_LogisticRegression_and_ClassificationTree.Rmd
Data Source: Kaggle - Hotel Booking Demand: https://www.kaggle.com/jessemostipak/hotel-booking-demand
Co-authored by: : Karan Modi, Folarin Omotoriogun, Mark Tegeler, Yujie Wen