Medical Cost Personal Dataset Analysis

0. Introduction

Context e.g what is the dataset about, why is studying this dataset important
Description of the variables e.g bmi : minutes played by the player during that year

Plots of continuous and discrete variables (histograms, bar plots, density curves, box plots, scatter plots etc)
Identification of outliers

Correlation Analysis of all variables + pairplot()
linear regression model (Insurance Forecast)
- error, p-values and confidence intervales for bo and b1
- confidence bands
- prediction band
Diagnostic plots (to check that a regression can be performed)
- Residual vs fitted plot
- Residual QQ-plot
- Scale-location plot
- Residual vs leverage plot
linear regression with 3 parameters (beta0, beta1, beta2)
linear regression with more than one variable ([BMI, age] --> Charges)