/noshows

Predicting no-shows from appointment data

Primary LanguageJupyter Notebook

Working on the no-show dataset (https://www.kaggle.com/joniarroba/noshowappointments), a dataset of patient appointments, we attempt to predict whether or not a patient will show up for their appointment. Only about ¼ of the patients are no-shows, and in this repo, we show that by generating more no-shows, we can improve the performance of patient no-show classifiers. See the results below:

Original dataset results:

Model Accuracy AUC Recall Prec. F1 Kappa MCC TT (Sec)
catboost CatBoost Classifier 0.8026 0.7461 0.0778 0.5843 0.1372 0.0942 0.1582 14.866
lightgbm Light Gradient Boosting Machine 0.8015 0.7433 0.0376 0.6444 0.0711 0.05 0.1204 39.915
xgboost Extreme Gradient Boosting 0.8003 0.7431 0.092 0.5332 0.1569 0.1035 0.1567 6.864
rf Random Forest Classifier 0.8022 0.7411 0.1601 0.5339 0.2463 0.169 0.21 4.068
gbc Gradient Boosting Classifier 0.7984 0.7332 0.0067 0.6078 0.0132 0.0086 0.0463 3.843
ada Ada Boost Classifier 0.7976 0.7282 0.0168 0.463 0.0323 0.0186 0.0557 0.924
et Extra Trees Classifier 0.7905 0.726 0.1991 0.4573 0.2773 0.1765 0.1974 6.206
lda Linear Discriminant Analysis 0.791 0.681 0.0436 0.3569 0.0776 0.0353 0.0613 1.368
lr Logistic Regression 0.7954 0.6784 0.025 0.398 0.0471 0.0236 0.0552 5.518
knn K Neighbors Classifier 0.7778 0.6744 0.2076 0.403 0.2739 0.1583 0.1705 9.431
nb Naive Bayes 0.2345 0.5988 0.9611 0.204 0.3365 0.005 0.0206 0.087
dt Decision Tree Classifier 0.7344 0.5862 0.3361 0.3404 0.3382 0.1721 0.1721 0.482
qda Quadratic Discriminant Analysis 0.5235 0.5083 0.4827 0.2066 0.2772 0.0098 0.0144 7.784
svm SVM - Linear Kernel 0.7981 0 0 0 0 0 0 0.218
ridge Ridge Classifier 0.7976 0 0.0092 0.4512 0.018 0.01 0.0396 0.077

Results with synthetic dataset:

Model Accuracy AUC Recall Prec. F1 Kappa MCC TT (Sec)
catboost CatBoost Classifier 0.86 0.9333 0.7607 0.9283 0.7721 0.7213 0.7438 20.435
xgboost Extreme Gradient Boosting 0.8552 0.93 0.7629 0.9094 0.7726 0.7116 0.7324 8.432
rf Random Forest Classifier 0.8521 0.9292 0.7843 0.8887 0.7896 0.7052 0.7262 6.809
lightgbm Light Gradient Boosting Machine 0.8534 0.9273 0.7582 0.9005 0.7707 0.7079 0.7257 1.061
et Extra Trees Classifier 0.8393 0.9183 0.8018 0.8482 0.7943 0.6794 0.6974 11.339
gbc Gradient Boosting Classifier 0.8092 0.9063 0.7941 0.8015 0.7782 0.619 0.6325 6.779
knn K Neighbors Classifier 0.8018 0.8786 0.7211 0.8423 0.7494 0.6045 0.6188 22.901
ada Ada Boost Classifier 0.7615 0.8589 0.776 0.7462 0.7521 0.5231 0.5322 1.469
lr Logistic Regression 0.7455 0.8153 0.7272 0.7437 0.7262 0.4913 0.4969 5.689
lda Linear Discriminant Analysis 0.7457 0.8148 0.7273 0.7446 0.727 0.4918 0.4973 2.22
dt Decision Tree Classifier 0.8098 0.8101 0.796 0.8 0.7767 0.6201 0.6349 0.719
nb Naive Bayes 0.5981 0.7097 0.2909 0.7307 0.4108 0.1992 0.2414 0.125
qda Quadratic Discriminant Analysis 0.5032 0.5046 0.2655 0.5167 0.3194 0.0092 0.0121 2.023
svm SVM - Linear Kernel 0.7413 0 0.7165 0.745 0.7236 0.483 0.4875 0.374
ridge Ridge Classifier 0.7457 0 0.7273 0.7446 0.727 0.4918 0.4973 0.088