/1st_Project

This is my first data science project during the bootcamp. I figured out recommendations for finding houses in King County (USA) based on exploratory data analysis. Furthermore a multivariate linear regression model was developed to actually predict the house prices.

Primary LanguageJupyter Notebook

1st_Project - King County House Data Set

This repository includes the results of my first data science project during my data science bootcamp at neuefische in Hamburg.

This project includes work with the complete data science lifecycle on a dataset. The given dataset is the King County House dataset, which includes sales in the King County Area from begining of May 2014 to the end of May 2015. My task in this project is to figure out at least 3 recommendations for buyers based on the given dataset and developed features. Furthermore a multivariate linear regression model to predict the price of houses is developed. To get an impression about the area of King County, the map of King County is shown here.

Map of King County

This github repository includes the following data:

  • Jupiter notebook with all code and descriptive text for all steps of the data science lifecycle Jupyter Notebook
  • Slides of the Presentation as a PDF-file Slides
  • under figures the ouput figures are stored
  • under rawdata the given original dataset and the given column description are stored

The online presentation on GSlides is here

The focus of this project is mainly on EDA (Exploratory Data Analysis), but during the project all steps of the data science lifecycle are conducted. These steps are summed up in the following:

Business Understanding

  • What is the objective of this project?
  • What prolems need to be tackled?

Data Mining

  • Get data or scrape data.

Data Cleaning

  • Fix missing data.
  • Fix inconsistencies based on assumptions.

Data Exploration

Visually analyze your data by using:

  • Correlation analysis
  • Heatmap
  • Histograms
  • Scatter Plots
  • Box plots
  • Surface Plots

Feature Engineering

Select important features and develop new and more meaningful data. In this project new features regarding the age, distance to Seattle and renovation were developed.

Predictive Modelling

Use machine algorithms to make predictions. In this case a multivariate linear regression model was developed including new features.

Data Visualization

Communicate the key findings using plots and visualizations. In this project this is a presentation to non-technical stakeholders.