- Open this repository in Google Colab as shown in class
- When satisficed that you have completed the required changes in Colab commit your Jupyter Notebook back to GitHub as shown in class
- You must ensure that your solution has been pushed to GitHub in order to get credit for the exercise.
-
We will use the included Iowa House Price training dataset ( source: Kaggle: House Prices - Advanced Regression Techniques )
-
Import the data into Pandas as perhaps a raw GitHub URL import if using Google Colab
-
Exploratory Data Analysis (EDA):
- Load the dataset into a Pandas DataFrame.
- Check for data types, and basic statistics using Pandas functions
NOTE: The data set is large and has many columns. You might want to follow the YData suggestions for handling large datasets YData Profiling: Profiling large datasets.
-
Data Profiling:
- Use YData Profiling or a similar tool to generate a data profile report.
- Analyze key data quality metrics such as completeness, uniqueness, and missing values.
- Explore the distribution of each feature
-
Data Visualization:
- Use either Seaborn Plotly or both to generate some useful data visualizations.
- Create scatter plots or pair plots to visualize relationships between numerical features and the target variable (sale prices).
- Utilize box plots or violin plots to showcase variations in the target variable across different categories.
- Copy screenshots into the repository showing the output(s) of each of the above 3 phases
Final Grade | Requirement |
---|---|
10/10 | Exercise is correctly done (for the most part) and is completed within the allotted in-class time. |
8/10 | Exercise is correctly done (for the most part) and is completed within a 12-hour grace period beginning immediately following the end of in-class time. |
6/10 | Exercise is correctly done (for the most part) and is completed and submitted after the 12-hour grace period has elapsed. |
0/10 | Exercise is not submitted or is largely incomplete. |
Written with StackEdit.