Project Name

Surprise Housing - advanced regression Assignment

General Info
Technologies Used
Conclusions
Acknowledgements

General Information

A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file below. The company is looking at prospective properties to buy to enter the market. You are required to build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not. The company wants to know:

a) Which variables are significant in predicting the price of a house, and b) How well those variables describe the price of a house. Also, determine the optimal value of lambda for ridge and lasso regression.

For this use case we are using train.csv data a data description.

Steps involved in Multiple Linear Regression

Step 1: Read the data from the source Step 2: Understand The data Step 3: Clean up the data, drop unwanted columns, check for duplicates etc Step 4: EDA Step 5: Perform outlier analysis Step 6: Impute data whereever necessary Step 7: Map all categorical variables to Dummy variables Step 8: Train and test split Step 9: Scaling Step 10: Feature Selection Step 11: Evaluate Test set

Conclusions

The top features contributing significantly are as below

OverallQual
OverallCond
YearBuilt
Neighborhood_StoneBr
Exterior1st_BrkFace
TotalBsmtSF
LotArea

More details and analysis in the "Rakesh_Krishnamurthy.ipynb" and "AdvancedRegressionPart2"

Technologies Used

pandas - version 1.4.4
seaborn - version 0.12.2
numpy - version 1.23.5
matplotlib - version 3.7.0
sklearn - version 0.0.post5

Acknowledgements

Contributors

Rakesh Krishnamurthy

Contact

Created by [@rakekris] - feel free to contact me!