/Insurance_Premium_Prediction

Primary LanguagePythonApache License 2.0Apache-2.0

Insurance Premium Prediction Projecct

Build a solution that should able to predict the premium of the personal for health insurance.

alt text

About Dataset

The insurance.csv dataset contains 1338 observations (rows) and 7 features (columns). The dataset contains 4 numerical features (age, bmi, children and expenses) and 3 nominal features (sex, smoker and region) that were converted into factors with numerical value designated for each level.

Please check the link below:

Streamlit Deployment - [Current Live Link]: Streamlit-app*


Elastic Beanstalk: [*Deployment Link Live-Beanstalk*](http://insurance-env-1.eba-ztjhym2p.ap-south-1.elasticbeanstalk.com/)

Documentation:

High Level Design

Low Level Design

Project Architecture

Project Wireframe:

Detailed Project Report

(back to top)

Steps Taken:

  • Installed Python, VS Code and Git.
  • Create env python=3.9
  • Run Requirement file
  • Created an account on Atlas MongoDB.
  • Download the source dataset from Kaggle Repository.
  • For Regression Problem algorithm decided to predict the feature expenses.
  • Deployed on AWS-EC2.

Data Cleaning:

  • Data was cleaned which has an header issue, missing values, misplaced values and outliers.

EDA and Feature Engineering:

  • In this step, we will apply Exploratory Data Analysis (EDA) to extract insights from the data set to know which features have contributed more in predicting Forest fire by performing Data Analysis using Pandas and Data visualization using Matplotlib & Seaborn.
  • Done Feature scaling by Standard Scaler in which data lies between -1 and +1.

Model Building

  • For Regression Problem algorithm decided to predict the feature expenses.
  • Models used : Linear regression, Random forest, Decision tree, Ada-boost and Grad-boost.

(back to top)

Model Selection

  • HyperParameter Tuning with Gridsearch CV is done for both Regression.
  • For Regression: Metrics are r2 score, adjusted r2 and mean absolute error.

Flask, Docker and AWS Deployment:

  • Build a Flask App with Docker file.
  • Deployed on AWS-EC2 with CI/CD pipeline through Github actions.

ML-Flow and DVC [facilitate collaboration ml-lifecycle]:

  • Used MLflow for experiment tracking, logging metrics, parameters, and artifacts during model training.
  • Used DVC to version control and manage your large datasets efficiently.
  • By integrating MLflow and DVC, we can create a more robust and reproducible machine learning workflow that addresses both code and data versioning concerns.

(back to top)

Prediction Screen:

Screenshot_predict

Technologies used

Python NumPy Pandas Scikit-learn Flask MongoDB

Tools used

Visual Studio Code Git GitHub AWS-EC2

Contact

Alok Kumar|LinkedIn Alok Kumar|G-Mail

(back to top)