/NHANES-diabetes

An ensemble model for predicting diabetes onset using NHANES Data

Primary LanguageJupyter Notebook

An Ensemble Model for Predicting the Onset of Diabetes using NHANES Data

By John Semerdjian & Spencer Frank

Code

Our models are contained in the NHANES.ipynb notebook. In order to run the notebook, create a virtual environment and install the required modules.

# create a virtual environment, "nhanes"
$ mkvirtualenv --python=/usr/local/bin/python3 nhanes
$ workon nhanes

# install required modules
$ pip install -r requirements.txt

# download/merge data
$ python ./bootstrap.py

# start ipython notebook
$ ipython notebook

Video & Report

You can find our report here.

Abstract

Prediction of disease onset from patient survey and lifestyle data is quickly becoming an important tool for diagnosing a disease before it progresses. In this study data from the National Health and Nutrition Examination Survey (NHANES) questionnaire is used to predict the onset of diabetes. An ensemble model using the output of several classification algorithms was developed to predict the onset on diabetes based on 16 features. The ensemble model had an AUC of 0.834 indicating high performance.

Features and Descriptions

Additional Variables