
Data Profiling and Visualization project

Primary LanguageHTML

DBAS 5125 - Exercise 6

Case Study #1: House Prices Dataset


  • Open this repository in Google Colab as shown in class
  • When satisficed that you have completed the required changes in Colab commit your Jupyter Notebook back to GitHub as shown in class
  • You must ensure that your solution has been pushed to GitHub in order to get credit for the exercise.

Preliminary Steps


  1. Exploratory Data Analysis (EDA):

    • Load the dataset into a Pandas DataFrame.
    • Check for data types, and basic statistics using Pandas functions

NOTE: The data set is large and has many columns. You might want to follow the YData suggestions for handling large datasets YData Profiling: Profiling large datasets.

  1. Data Profiling:

    • Use YData Profiling or a similar tool to generate a data profile report.
    • Analyze key data quality metrics such as completeness, uniqueness, and missing values.
    • Explore the distribution of each feature
  2. Data Visualization:

    • Use either Seaborn Plotly or both to generate some useful data visualizations.
    • Create scatter plots or pair plots to visualize relationships between numerical features and the target variable (sale prices).
    • Utilize box plots or violin plots to showcase variations in the target variable across different categories.
  • Copy screenshots into the repository showing the output(s) of each of the above 3 phases

Marking Scheme

Final Grade Requirement
10/10 Exercise is correctly done (for the most part) and is completed within the allotted in-class time.
8/10 Exercise is correctly done (for the most part) and is completed within a 12-hour grace period beginning immediately following the end of in-class time.
6/10 Exercise is correctly done (for the most part) and is completed and submitted after the 12-hour grace period has elapsed.
0/10 Exercise is not submitted or is largely incomplete.

Written with StackEdit.