A regression based project for predicting house 🏠 price💲.
The purpose of this project is to study the relationship between different factors(dependent variables) that influence the price(target variable) of houses in India and predict house price using different Machine Learning algorithms.
- Python
- matplotlib
- seaborn
- numpy
- pandas
- scikit-learn
- scipy
- etc.
This is an Indian house price predicting regression based project.
Dataset can be downloaded from kaggle
-
First the data set is checked for any missing or null values. If there's any any, then they are either dropped or replaced with mean or median as per analysis.
-
We can see that of all the features, Area has maximum influence on the target variable i.e price.
-
Then we analyse the categorical variables.
- Then we remove the outliers.
- Visualizing the area feature using boxplot.
- To select the fetures that influence the target variable, corr() command is used to look into the relationship of the dependent variables with the target variable.
- And heatmap is used for proper visual representation.¶
- The features that has the maximum influence on the target variable (price) are considered and rest all are dropped.
- Then the outliers are removed using IQR(Interquantile Range) method.
- The dataset is then scaled using StandardScaler.
- And dataset is split into train and test sets using train_test_split method.
- Using sklearn to train the model on the training data set and testing on the test data set.
- Metrics used: RMSE for evaluation of the model.