/insurance_charges_estimation

regression analysis for insurance charges

Primary LanguageR

insurance charge estimation

insurance charge estimation based on regression analysis

Abstract

A dataset from Kaggle is collected on the relationship between individual medical expenditure and several mutually independent factors. Based on this dataset, we analyzed the relationship between factors, such as age, BMI, and healthcare expenditures. We further developed a regression model based on simple linear regression approach. A simple linear regression is conducted on the nonsmoker category which achieves an accuracy of 0.9 on the test set to be in the prediction interval. The smoker category is separated into the high-charge group and low-charge group. Multiple linear regression is conducted on both groups. The $R^{2}$ scores are both over 0.9. Finally, it is concluded that the health expenditure of all populations is linearly related to age, and the effect of BMI on health expenditure varies with or without smoking.

Reference

Find pdf here: report