This project is about predicting the final sale price of a house. The data is collected from Kaggle. The data set consists of 1460 observations with 81 variables. All the predictors explain the various features of the house, the data frame consists of one output variable 'Sale Price'. Data cleaning steps such as introducing new classes to missing categorical data, filling mean values for missing numerical data (Imputation) are used. Various plots such as scatter plots, violin plots, box plots, bar graphs etc. are plotted to explore the relationships between the output variable 'Sale Price' and predictors. ML algorithms such as Linear Regression, Ridge Regression, Lasso Regression are used to explore the positive and negative coefficients that influence the final Sale Price. The concept of Cross Validation is used to extract the best RMSE (Root mean squared error) score to analyse the best algorithm of all the algorithms applied. Regression plot and Residual plots are plotted to get the visualizations of the performance of the model on test data.
NikhilaThota/CapstoneProject_House_Prices_Prediction
Understand the relationships between various features in relation with the sale price of a house using exploratory data analysis and statistical analysis. Applied ML algorithms such as Multiple Linear Regression, Ridge Regression and Lasso Regression in combination with cross validation. Performed parameter tuning, compared the test scores and suggested a best model to predict the final sale price of a house. Seaborn is used to plot graphs and scikit learn package is used for statistical analysis.
Jupyter Notebook