DBAS 5125 - Exercise 6

Case Study #1: House Prices Dataset

Instructions:

Open this repository in Google Colab as shown in class
When satisficed that you have completed the required changes in Colab commit your Jupyter Notebook back to GitHub as shown in class
You must ensure that your solution has been pushed to GitHub in order to get credit for the exercise.

Preliminary Steps

We will use the included Iowa House Price training dataset ( source: Kaggle: House Prices - Advanced Regression Techniques )
Import the data into Pandas as perhaps a raw GitHub URL import if using Google Colab

Tasks

Exploratory Data Analysis (EDA):
- Load the dataset into a Pandas DataFrame.
- Check for data types, and basic statistics using Pandas functions

NOTE: The data set is large and has many columns. You might want to follow the YData suggestions for handling large datasets YData Profiling: Profiling large datasets.

Data Profiling:
- Use YData Profiling or a similar tool to generate a data profile report.
- Analyze key data quality metrics such as completeness, uniqueness, and missing values.
- Explore the distribution of each feature
Data Visualization:
- Use either Seaborn Plotly or both to generate some useful data visualizations.
- Create scatter plots or pair plots to visualize relationships between numerical features and the target variable (sale prices).
- Utilize box plots or violin plots to showcase variations in the target variable across different categories.

Copy screenshots into the repository showing the output(s) of each of the above 3 phases

Marking Scheme

Final Grade	Requirement
10/10	Exercise is correctly done (for the most part) and is completed within the allotted in-class time.
8/10	Exercise is correctly done (for the most part) and is completed within a 12-hour grace period beginning immediately following the end of in-class time.
6/10	Exercise is correctly done (for the most part) and is completed and submitted after the 12-hour grace period has elapsed.
0/10	Exercise is not submitted or is largely incomplete.

Written with StackEdit.

pdushie/Data_Profiling_Visualization

DBAS 5125 - Exercise 6

Case Study #1: House Prices Dataset

Instructions:

Preliminary Steps

Tasks

Marking Scheme