/Thesis-Code

This repository has the R Code, Oracle SQL code, and Documentation of my thesis.

Primary LanguageR

Thesis-Code

This repository contains the R Code, Oracle SQL code, and thesis documentation.

ABSTRACT

Diabetes type 2 occurs in African Americans at a rate higher than Non-Hispanic whites. They are characterized by higher rates of the disease, with higher rates of mortality than other ethnic groups. With advancements in medical and computational technology, more information than ever exists in the form of medical data.

The objective of this project is to perform cluster analysis on anonymized diabetes type II data from Howard University Hospital’s electronic health records.

The data was first extracted from SQL, cleaned, and preprocessed. It was then uploaded into R. Four algorithms were chosen to create two, three, four, and five clusters of the data, which was then subject to comparative analysis. It was then determined that DIANA (Divisive ANAlysis) clustered the data best, and from which results were extrapolated.

It was discovered that there were high correlations between type II diabetes, hypertension, hyperlipidia, and cholesterolemia, which validated existing knowledge about African Americans most at risk for diabetes. There was also evidence of higher rates of benign neoplasm of the colon; non-cancerous colon tumors. Distinctions about other chronic diseases were made by gender and marital status. There were significantly more cases of acquired hypothyroidism cases occurring in women who are black, female, and non-single. There were elevated incidences of prostate cancer (neoplasm, malignant, of the prostate) in men who are black and non-single. Incidences of Tobacco use disorder also had higher occurrences in clusters featuring mostly single men and women. Many of these relationships remain unexplored. Performing cluster analysis on electronic health records has enormous potential as a method of research. With advances in computational power and the proliferation of data, there is huge opportunity in mining medical data for knowledge.