The domain of the problem

This data set contains information from the Ames Assessor’s Office used in computing assessed values for individual residential properties sold in Ames, IA from 2006 to 2010.

Problem statement

We'll use the supervised learning to develop a regression model to predict housing sale price. We'll group the housing by clustering the data.

Dataset Description

The dataset has 1451 samples and 80 attributes. 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables. The SalePrice attribute is the target data, it's a continous value.

A proposed solution

A solution could be develop a linear regression model and clustering data into different groups.

A benchmark model

A good native benchmark could be the mean or median of the SalePrice.

A performance Metric

We can calculate the coefficient of determination, R2 or use MSE (mean square error) to quantify our model’s performance.