/Madrid-Real-Estate-Price-Prediction

This is a Supervised Learning Model which predicts the Price of Real Estate in Madrid, Spain.

Primary LanguageR

Madrid-Real-Estate-Price-Prediction

This is a supervised learning model which predicts the price of Real Estate in Madrid, Spain.

Data collected from Kaggle.

The dataset consist listings from popular real estate portals of Madrid. I first cleaned the data in excel and then imported it to R.

Data Sheet

As we can see there are 16 coloumns in the data.

dimensions_madrid

After looking through the data we see that there are 145 unique areas in the data-set. Then we added a sq_ft coloumn in the data and a buy_price_per_sq_ft coloumn to get an idea of the buy price. Because the buy_price can depend on a lot of factors and thus will be varied but buy_price_per_sq_ft gives a glimpse into the variability of the data.

Now, since we're interested in the buy price, we plotted buy_price_per_sq_ft with the number of data points to check if our data is normal or not.

Price Per Sq Ft

As we can see from the plot, our data is bi-modal, so we cannot perform Linear Regression on our data, because normality is a condition for Linear Regression. So, we will perform Quantile Regression.

Then I cleaned the data some more and removed all the NA's, NAN's & INF's, converted text coloumns to factors and then all the columns to numerics.

Cleaned Dataset

Then I checked the correlation of independent variables and removed those which had a correlation-coefficient greater than 0.7.

Then I plotted the buy price vs number of rooms to get a general view of my data and see if their were outliers.

Price v Rooms

Plotted a histogram for the dependent variable.

histogram of buy price

And a paired scatter plot of the independent variables to see correlation.

indep_scatterplot

Then we divide the data into Test and Train & use the Train data to train our model. When the quantile regression is done we do the AIC check to make sure our model is not over-fitted and then we get our final model.

Final Model

Now we use our model to predict the Test set values and check the accuracy of it. We do so by plotting a gain curve.

Gain Curve

A relative Gini score close to 1 means the model sorts responses well. And since our relative Gini score is 0.94 we can say that our model predicts well and thus is a good fit.

Now we try to make a web page in R Shiny, which can be used to predict the price of the property in madrid (in Euro) through our model.

This is how our webpage looks.

Web Page

You can also use the following link.

Price Prediction of Madrid Real Esate