/Pima_Indians_Diabetes_IBM_SPSS_Modeler_and_Microsoft_Excel

The goal of the project is to diagnostically predict whether or not a patient has diabetes by using IBM SPSS Modeler and Microsoft Excel Data Analysis tool.

Pima_Indians_Diabetes_IBM_SPSS_Modeler_and_Microsoft_Excel

Microsoft Excel :

In the first part of the project, I analyzed the dataset, addressed missing values and generated visual charts by using the Microsoft Excel Data Analysis tool.

For cholestoral:

Shot 0008

Correlation Matrix (Pearson):

Shot 0009

Normal Probability Plot after handling missing values:

Shot 0012

IBM SPSS Modeler :

In the second part, The objective of the project is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset bu using IBM SPSS Modeler. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

Pima Indians Diabetes Database can be downloaded from here.

The schema of the project in IBM SPSS Modeler:

1  Schema of the project

2  Schema of the project 2

Test scores before performing anomaly detection algorithms:

3  Before Performing Anomaly Detection Algorithms

Test scores after performing anomaly detection algorithms:

4  After Performing Anomaly Detection Algorithms

I ran different models like: C5.0, CART, Random Forest, Quest, Regression. I got the best test score with CHAID (Chi-squared Automatic Interaction Detection) model:

5  Chi-Squared_Best_Model

The final schema of the project:

6  Final Schema

The hyperparameters of the model:

7  Hyparameters

  • Top 3 decisive features in predicting the target was Glucose, Age, BMI.