Exploratory Data Analysis on the US Healthcare Insurance

Overview

The aim of this project is to conduct an exploratory data analysis on the US Healthcare Insurance dataset. The dataset includes information about insurance charges of US citizens along with demographic information such as age, smoking status, BMI, number of children, sex, and region. The project focuses on answering four main research questions:

  1. Is there a correlation between smoking status and insurance charges? Are smokers charged more than non-smokers?
  2. Is there a correlation between BMI and insurance charges? Are overweight or obese individuals charged more for insurance?
  3. Who is more likely to receive insurance charges over $16,700 - smokers or non-smokers?
  4. Who is more likely to receive insurance charges over $16,700 - individuals with a BMI below 25 or those above 25?

The exploratory data analysis was conducted using Excel, and the file is attached. Descriptive statistic, probability, and hypothesis testing were being used for the analysis.

Key Findings

The analysis revealed the following key findings:

  1. There is a strong positive correlation between smoking status and insurance charges. Smokers are charged more than non-smokers.
  2. There is a weak positive correlation between BMI and insurance charges. Overweight or obese individuals with a BMI higher than 25 are charged more than those with a BMI lower than 25.
  3. Smokers are much more likely to receive insurance charges over $16,700 with a probability of 92.7%, compared to non-smokers whose probability is 0.08%.
  4. Individuals with a BMI higher than 25 are more likely to receive insurance charges over $16,700 with a probability of 26%, compared to those with a BMI below 25 whose probability is 21%. The difference between the two probabilities is only 5%.
  5. If an individual with a BMI higher than 25 is known to be a smoker, the probability of receiving insurance charges over $16,700 increases to 98%. This means that out of 100 people with these characteristics, only two people have insurance charges below $16,700. The probability remains low for non-smokers at 0.08%.

For a more comprehensive report, please follow this link https://medium.com/@adindazr/who-is-more-likely-to-have-higher-medical-charges-a-smoker-or-an-obese-person-8f8bbdaaa637

Visualization

Kindly visit this link to go tableau dashboard for visualization: https://public.tableau.com/views/SmokingObesityInfluencedonMedicalInsurances/Dashboard1?:language=en-GB&:display_count=n&:origin=viz_share_link

Video

Project : https://youtu.be/I75SefgI3BU Theory : https://youtu.be/jcmhF_-tCi8