/first_EDA_project

After 2 weeks Data Science Bootcamp @neuefische we started with our first EDA project.

Primary LanguageJupyter Notebook

Analysis of King County Data

Description

This is my first project during the bootcamp. Here I'm working with the King County House Sales dataset. The focus is on EDA to demonstrate an entire Data Science Lifecycle. The project can also be divided into the following steps:

  • Business Understanding
  • Data Mining
  • Data Cleaning
  • Data Exploration / Analysis
  • Feature Engineering
  • Predictive Modelling
  • Data Visualization

The data

The dataset can be found in the file "King_County_House_prices_dataset.csv", in this folder. The description of the column names can be found in the column_names.md file in this repository.

Tasks

Through statistical analysis/EDA, above please come up with AT LEAST 3 (you can definitely get bonus points for more than 3) recommendations for home sellers and/or buyers in King County. Then model this dataset with a multivariate linear regression to predict the sale price of houses as accurately as possible. Acceptable R squared values = 0.7 to 0.9 Optional: Split the dataset into a train and a test set. Use Root Mean Squared Error (RMSE) as your metric of success and try to minimize this score on your test data.

Result

The result of the project can be found in the attached jupyter notebbook and in the slides which are attached as well.