House Prices: Advanced Regression Techniques

Authors:

ChatGPT was used to generate some code and helped provide comments on other code

Introduction

Welcome to this machine learning notebook on House Prices: Advanced Regression Techniques. This notebook is intended to be a practical project for an introductory machine learning course, to predict the sale prices of houses based on various features. The dataset used in this notebook is based on the Ames Housing dataset, which was compiled by Dean De Cock for use in data science education.

In this notebook, we will explore various machine learning algorithms to predict house prices. Specifically, we will be implementing and comparing the performance of linear regression, K-Nearest Neighbours (KNN), and Random Forest models. To evaluate the performance of each model, we will utilize the Root Mean Square Error (RMSE). The RMSE will allow us to compare the accuracy of each model and determine which model provides the best prediction of house prices.

Additionally, this project will involve the full machine learning pipeline, including data cleaning and pre-processing, feature selection, model selection and tuning, and evaluation. We will also be using various visualization techniques to explore and present our findings.

Our goal is to find the most suitable regression model for predicting house prices and to identify the most relevant features contributing to the predictions. We will use various data processing and feature selection techniques to optimize the model's performance.

Read the notebook.

A concluding analysis of the project can be seen here.