Using Machine Learning to Predict Super-Utilizers of Healthcare Services

Abstract

In this dissertation, I aim to forecast high utilizers of emergency care and inpatient Medicare services (i.e., healthcare visits). Through a literature review, I demonstrate that accurate and reliable prediction of these future high utilizers will not only reduce healthcare costs but will also improve the overall quality of healthcare for patients. By identifying this population at risk before manifestation, I propose that there is still time to reverse undesirable healthcare trajectories (i.e., individuals whose clinical risk increases an excessive healthcare and treatment burden) through timely attention and proper care coordination. My dissertation culminates in the delivery of state-of-the-art predictive models that exploit well-researched clinical, behavioral, and social determinants associated with increased inpatient and emergency care utilization. I discuss my contributions to applied machine learning in healthcare herein, and further examine ethical concerns common to similar machine learning tasks. Finally, I conclude by reviewing how this research can be advanced through future work.

Table of Contents

  1. Manuscript (ProQuest link will be added when published)
  2. Appendix A. Data Integrity Guidelines
  3. Appendix B. Data Dictionaries
  4. Appendix C. Exclusion Criteria
  5. Appendix D. Hyperparameters
  6. Appendix E. Performance Tables
  7. Appendix F. Feature Importances
  8. Appendix G. Shapley Values
  9. Appendix H. Disaggregation Analysis

Data Use Agreement

Note: In accordance with the CMS Data Use Agreement "Section 8b.," The User may not disclose the limited data set file(s) specified in section 4 of the Agreement to a Secondary User until and unless the Secondary User enters into a DUA with CMS. CMS would only enter into a DUA with a Secondary User if the purpose for which the secondary use of the limited data set file(s) is consistent with the purpose specified in Section 3 of this Agreement. "Section B." states that The User represents that the limited data set files in section 4 have been used solely for the following research purpose: "Using Machine Learning to Predict High Utilizers of Healthcare." In adherence with "Section 9.," The User has agreed to establish appropriate administrative, technical, and physical safeguards to protect the confidentiality of the limited data set file(s) and to prevent unauthorized use or access to it. The safeguards have provided a level and scope of security that is not less than the level and scope of security established by the Office of Management and Budget (OMB) in OMB Circular No. A—130, Appendix III, Security of Federal Automated Information Systems (previously http://www.whitehouse.gov/omb/circulars/a130/a130.html, now https://obamawhitehouse.archives.gov/sites/default/files/omb/assets/OMB/circulars/a130/a130revised.pdf), which sets forth guidelines for security plans for automated information systems in Federal agencies. The User acknowledges that the use of unsecured telecommunications, include the Internet, to transmit individually identifiable or deducible information derived from the limited data set file(s) must not be physically moved or electronically transmitted in any way from the site indicated in section 15 without prior written approval from CMS. In conformance with these guidelines, the data is set to be destroyed with proof of receipt in May 2021.