Data Analysis on Diabetes Dataset Using Python and Tableau

NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases) research creates knowledge about and treatments for the most chronic, costly, and consequential diseases. The dataset used in this project is originally from NIDDK. The datasets consists of several medical predictor variables and one target variable (Outcome). Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and more.

Variables Description

Pregnancies -> Number of times pregnant

Glucose -> Plasma glucose concentration in an oral glucose tolerance test

BloodPressure -> Diastolic blood pressure (mm Hg)

SkinThickness -> Triceps skinfold thickness (mm)

Insulin -> Two hour serum insulin

BMI -> Body Mass Index

DiabetesPedigreeFunction -> Diabetes pedigree function

Age -> Age in years

Outcome -> Class variable (either 0 or 1). 268 of 768 values are 1, and the others are 0

Data Analytics using Python

  1. Descriptive analysis of the variables and their corresponding values. On the columns below, a value of zero does not make sense and thus indicates missing value:

• Glucose

• BloodPressure

• SkinThickness

• Insulin

• BMI

  1. A count (frequency) plot describing the data types and the count of variables.

  2. Visualize the variables using histograms and treated the missing values accordingly.

  3. Displays the balance of the data by plotting the count of outcomes by their value and describe the findings and plan a future course of action.

  4. Scatter charts between the pair of variables to understand the relationships.

  5. Correlation analysis using a heat map.

Data Analytics Using Tableau

a. Pie chart to describe the diabetic or non-diabetic population

b. Histogram or frequency chart to analyze the distribution of the data

c. Created bins of these age values: 20-25, 25-30, 30-35, etc to analyze different variables for these age brackets using a bubble chart.

d. Heatmap of correlation analysis among the relevant variables

Tableau Dashboard Link

https://public.tableau.com/views/DiabetesDataAnalysis_16721389003210/DiabetesDataAnalysis?:language=en-GB&publish=yes&:display_count=n&:origin=viz_share_link