Insurance companies invest significant resources into accurately predicting customer behavior, such as the likelihood of filing insurance claims. On the Road car insurance approached us to build a predictive model that determines whether a customer will make a claim during their policy period. This project aims to identify the most predictive feature using machine learning techniques, specifically logistic regression, to assist On the Road in optimizing their pricing and risk assessment strategies.
The dataset, car_insurance.csv
, contains various client attributes and a binary outcome indicating whether a claim was made:
id
: Unique client identifierage
: Client's age categorygender
: Client's genderdriving_experience
: Years the client has been drivingeducation
: Client's level of educationincome
: Client's income levelcredit_score
: Client's credit score (normalized)vehicle_ownership
: Client's vehicle ownership statusvehicle_year
: Year of vehicle registrationmarried
: Client's marital statuschildren
: Number of childrenpostal_code
: Client's postal codeannual_mileage
: Number of miles driven annuallyvehicle_type
: Type of vehiclespeeding_violations
: Total number of speeding violationsduis
: Number of times caught driving under the influencepast_accidents
: Total number of previous accidentsoutcome
: Binary variable indicating if a claim was made (0: No claim, 1: Made a claim)
- Data Preparation: Handle missing values for
credit_score
andannual_mileage
. - Model Building: Build logistic regression models for each feature to predict the
outcome
. - Model Evaluation: Calculate accuracy metrics to determine the best performing feature.
Ensure you have Python and the following libraries installed:
- pandas
- numpy
- statsmodels
- Clone the repository:
git clone https://github.com/NonsoOmoko/Project-Modelling-Car-Insurance-Claim-Outcomes.git cd Project-Modelling-Car-Insurance-Claim-Outcomes