Health Insurance Cost Forecast

-- Status: Completed

Purpose

The purpose of this analysis is to predict individual health insurance costs charged by health insurance companies based on age, sex, BMI, children, smoking, and region.

Methods Used

Supervised Machine Learning
Inferential Statistics
Descriptive Statistics
Machine Learning
Data Visualization
Predictive Modeling
Regression Analysis
Factor Analysis
Random Forest

Technologies

Python
R
Jupyter Notebook
Pandas
NumPy
Matplotlib
Scikit-learn
Graphviz
Seaborn
Yellowbrick
Pydot

Needs of this project

Data exploration/descriptive statistics
Data processing/cleaning
Statistical modeling
Writeup/reporting

Data Source

Kaggle: https://www.kaggle.com/mirichoi0218/insurance

Data Content

Age: Age of the beneficiary in years.
Sex: Whether the beneficiary is male or female.
BMI: Body mass index derived from the weight and height of an individual. A healthy BMI is generally known to be from 18.5 to 24.9.
Children: Number of dependents covered by health insurance.
Smoker: Whether or not the beneficiary smokes.
Region: The beneficiary's residential area in the US. The categories are northeast, southeast, southwest, northwest.
Charges: The price the beneficiary pays the health insurance companies in USD.

**Note: The individual paying for the health insurance is referred to as the "beneficiary" in the definitions.

Underlying Assumptions

The model should conform to the assumptions of linear regression to be usable in practice. To confirm this we examined the data set to check:

The regression model is linear in parameters
The mean of residuals is zero
Homoscedasticity of residuals or equal variance
Normality of residuals

ML Algorithm

Multi-linear regression (supervised learning)
Pandas.crosstab categorical variable sex smoker region to confirm values
Check for typos
Dollars, round decimals
Range of age
Incorrect entries
Data validation = exploratory data analysis
Data validation = cleaning the data

Other Contributing Members

Contact

Jason.Zelaya474@gmail.com

jasonzelaya/Insurance-Forecast