A project for some practice with the data analysis workflow. Please read the report of my findings!
This analysis aims to provide a brief insight into the relationship of factors such as age and gender with car insurance quote prices. The analysis limits its scope to the individual applying for the quote and assumptions insurance companies may make about them based on these characteristics, rather than what car is being insured.
The project allowed me the opportunity to learn more aboutthe following areas:
- Web automation and web scraping
- Exploratory data analysis
- Inferential statistics
- Data visualization
and improve my skills in the use of:
- Python
- Selenium
- Beautiful Soup
- Pandas
- Matplotlib
Data for the project was gathered using web automation tools to retrieve quotes for combinations of personas and cars. The list of personas is found in people.csv
, each with a name, age and gender. The provided set includes 164 instances, one male and one female for every age between 18 and 98. The list of cars is simply a list of registration numbers for real cars registered in Victoria, Australia, so the cars.csv
file has been ommitted from the repository for the privacy of the car owners and replaced with example_cars.csv
.
In order to produce the data in quotes.csv
, quote_scraper.py
uses https://www.comparethemarket.com.au/car-insurance/journey/start to retrieve a number of quotes for each combination of a person and a car.
As can be seen in cleaning_data.ipynb
i cleaned and cut the data down to the specific focus of the exercise: to examine the relationship between age, gender and comprehensive car insurance quote prices. The resulting dataframe is written out to quotesByAgeGender.csv
.
All code for plotting the the data from quotesByAgeGender.csv
can be found in analysis.ipynb
. Please read the report of my findings here.
I am learning! Please take any findings from this project with a grain of salt. If you have any feedback for me or constructive criticisms of my work please, please reach out I'd love to hear them: ldavoli.mail@gmail.com