/Machine-Learning-Risk-Estimation-ARC

Python codes for administrative data (long term care) residents- Odds ratio-Logistic regression-ML models

Primary LanguageJupyter Notebook

DOI

DOI

Title: Machine Learning Risk Estimation and Prediction of Death in Continuing Care Facilities using Administrative Data.

In this study, we aimed to identify the factors that were associated with mortality among continuing care residents in Alberta, during the coronavirus disease 2019 (COVID-19) pandemic. We achieved this by leveraging and linking various administrative datasets together. Then, we examined pre-processing methods in terms of prediction performance. Finally, we developed several machine learning models and compared the results of these models in terms of performance. We conducted a retrospective cohort study of all continuing care residents in Alberta, Canada, from March 1, 2020, to March 31, 2021. We used a univariable and a multivariable logistic regression (LR) model to identify predictive factors of 60-day all-cause mortality by estimating odds ratios (ORs) with a 95% confidence interval. To determine the best sensitivity–specificity cut-off point, the Youden index was employed. We developed several machine learning models to determine the best model regarding performance. In this cohort study, increased age, male sex, symptoms, previous admissions, and some specific comorbidities were associated with increased mortality. Machine learning and pre-processing approaches offer a potentially valuable method for improving risk prediction for mortality, but more work is needed to show improvement beyond standard risk factors.