In this project, I have used 2017 data to build a model for finding out probability of customers to ask for a claim in future (2018), along with severity of their claim in USD amount and then create Risk profile based on their probability of asking for a claim.
A glossary for the datasets is provided below:
- pol_number: policy number for the insurance policy
- pol_eff_dt auto insurance policy effective date
- gender gender of driver: F, M
- agecat driver's age category: 1 (youngest), 2, 3, 4, 5, 6
- date_of_birth driver's date of birth
- credit_score driver’s credit score(integer): 1-100, 1=poor, 100=excellent
- area driver's area of residence: A, B, C, D, E, F
- traffic_index traffic index of driver’s area of residence(integer): 100=country average, >100 means worse traffic conditions than average
- veh_age age of vehicle(categorical): 1 (youngest), 2, 3, 4
- veh_body vehicle body, coded as: BUS CONVT = convertible COUPE HBACK = hatchback HDTOP = hardtop MCARA = motorized caravan MIBUS = minibus PANVN = panel van RDSTR = roadster SEDAN STNWG = station wagon TRUCK UTE = utility
- veh_value vehicle value, in dollar 10,000
- smonths_insured number of months vehicle insurance is bought(integer)
- claim_office office location of claim handling agent: A, B, C, D
- numclaims number of claims(integer): 0 if no claim
- claimcst0 claim amount: 0 if no claim
- annual_premium total charged premium