In this project, we aim to predict house prices for 200 apartments in Pune city. We will use various regression models, such as Linear Regression, Random Forest, XGBoost, and multi-layer perceptron (MLP) models using scikit-learn and TensorFlow. The goal is to help predict house prices based on different property features.
We have a dataset with around 200 rows and 17 variables that influence the target variable, which is the house price.
- Language:
Python
- Libraries:
scikit-learn
,pandas
,NumPy
,matplotlib
,seaborn
,xgboost
- Import required libraries and load the dataset.
- Perform preliminary data exploration.
- Identify and remove outliers.
- Remove redundant feature columns.
- Handle missing values.
- Regularize categorical columns.
- Save the cleaned data.
- Import the cleaned dataset.
- Convert binary columns to dummy variables.
- Perform feature engineering.
- Conduct univariate and bivariate analysis.
- Check for correlations.
- Select relevant features.
- Scale the data.
- Save the final updated dataset.
- Prepare the data.
- Split the dataset into training and testing sets.
- Build various regression models, including Linear Regression, Ridge Regression, Lasso Regressor, Elastic Net, Random Forest Regressor, XGBoost Regressor, K-Nearest Neighbours Regressor, and Support Vector Regressor.
- Assess model performance using metrics like Mean Squared Error (MSE) and R2 score.
- Create residual plots for both training and testing data.
- Perform grid search and cross-validation for the chosen regressor.
- Fit the model and make predictions on the test data.
- Check for feature importance to identify the most influential factors in predicting house prices.
- Compare the performance of different models to choose the best one.
- Build MLP Regression models using both scikit-learn and TensorFlow.
To run this project, follow these steps:
- Install the required libraries listed in
requirements.txt
. - Execute the code in the
src
folder.