noshows: A Jupyter Notebook repository from Oufattole

Working on the no-show dataset (https://www.kaggle.com/joniarroba/noshowappointments), a dataset of patient appointments, we attempt to predict whether or not a patient will show up for their appointment. Only about ¼ of the patients are no-shows, and in this repo, we show that by generating more no-shows, we can improve the performance of patient no-show classifiers. See the results below:

Original dataset results:

Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC	TT (Sec)
catboost	CatBoost Classifier	0.8026	0.7461	0.0778	0.5843	0.1372	0.0942	0.1582	14.866
lightgbm	Light Gradient Boosting Machine	0.8015	0.7433	0.0376	0.6444	0.0711	0.05	0.1204	39.915
xgboost	Extreme Gradient Boosting	0.8003	0.7431	0.092	0.5332	0.1569	0.1035	0.1567	6.864
rf	Random Forest Classifier	0.8022	0.7411	0.1601	0.5339	0.2463	0.169	0.21	4.068
gbc	Gradient Boosting Classifier	0.7984	0.7332	0.0067	0.6078	0.0132	0.0086	0.0463	3.843
ada	Ada Boost Classifier	0.7976	0.7282	0.0168	0.463	0.0323	0.0186	0.0557	0.924
et	Extra Trees Classifier	0.7905	0.726	0.1991	0.4573	0.2773	0.1765	0.1974	6.206
lda	Linear Discriminant Analysis	0.791	0.681	0.0436	0.3569	0.0776	0.0353	0.0613	1.368
lr	Logistic Regression	0.7954	0.6784	0.025	0.398	0.0471	0.0236	0.0552	5.518
knn	K Neighbors Classifier	0.7778	0.6744	0.2076	0.403	0.2739	0.1583	0.1705	9.431
nb	Naive Bayes	0.2345	0.5988	0.9611	0.204	0.3365	0.005	0.0206	0.087
dt	Decision Tree Classifier	0.7344	0.5862	0.3361	0.3404	0.3382	0.1721	0.1721	0.482
qda	Quadratic Discriminant Analysis	0.5235	0.5083	0.4827	0.2066	0.2772	0.0098	0.0144	7.784
svm	SVM - Linear Kernel	0.7981	0	0	0	0	0	0	0.218
ridge	Ridge Classifier	0.7976	0	0.0092	0.4512	0.018	0.01	0.0396	0.077

Results with synthetic dataset:

Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC	TT (Sec)
catboost	CatBoost Classifier	0.86	0.9333	0.7607	0.9283	0.7721	0.7213	0.7438	20.435
xgboost	Extreme Gradient Boosting	0.8552	0.93	0.7629	0.9094	0.7726	0.7116	0.7324	8.432
rf	Random Forest Classifier	0.8521	0.9292	0.7843	0.8887	0.7896	0.7052	0.7262	6.809
lightgbm	Light Gradient Boosting Machine	0.8534	0.9273	0.7582	0.9005	0.7707	0.7079	0.7257	1.061
et	Extra Trees Classifier	0.8393	0.9183	0.8018	0.8482	0.7943	0.6794	0.6974	11.339
gbc	Gradient Boosting Classifier	0.8092	0.9063	0.7941	0.8015	0.7782	0.619	0.6325	6.779
knn	K Neighbors Classifier	0.8018	0.8786	0.7211	0.8423	0.7494	0.6045	0.6188	22.901
ada	Ada Boost Classifier	0.7615	0.8589	0.776	0.7462	0.7521	0.5231	0.5322	1.469
lr	Logistic Regression	0.7455	0.8153	0.7272	0.7437	0.7262	0.4913	0.4969	5.689
lda	Linear Discriminant Analysis	0.7457	0.8148	0.7273	0.7446	0.727	0.4918	0.4973	2.22
dt	Decision Tree Classifier	0.8098	0.8101	0.796	0.8	0.7767	0.6201	0.6349	0.719
nb	Naive Bayes	0.5981	0.7097	0.2909	0.7307	0.4108	0.1992	0.2414	0.125
qda	Quadratic Discriminant Analysis	0.5032	0.5046	0.2655	0.5167	0.3194	0.0092	0.0121	2.023
svm	SVM - Linear Kernel	0.7413	0	0.7165	0.745	0.7236	0.483	0.4875	0.374
ridge	Ridge Classifier	0.7457	0	0.7273	0.7446	0.727	0.4918	0.4973	0.088

Oufattole/noshows