Predicting house prices using machine learning models.
The primary goal is to build and evaluate various models to predict house prices based on the housing price dataset.
The models used include:
- Linear Regression
- Lasso (L1)
- Ridge (L2)
- Random Forest Regression
The dataset used is sourced from Kaggle and includes various features related to house prices.
-
Reading the Data:
- Load and inspect the dataset.
- The raw data is located in the
data
directory under the namehousing.csv
.
-
Exploratory Data Analysis (EDA):
- Explore the data size.
- Explore the data types and check for empty values.
- Analyze the data by visualizing its features.
- Analyze the correlation between the features.
-
Data Preprocessing:
- Convert
lat
&long
coordinates tocity
using the LocationIQ API. - Replace
Unknown
city values with the nearest city using NearestNeighbors, based onlat
&long
coordinates. - Remove unnecessary features.
- Detect and handle outliers using IQR & KNN.
- Encode categorical features using One-Hot Encoding.
- Split the dataset into
inputs
&outputs
, then split it intotrain
&test
sets. - Ensure the data is ready for training and evaluation.
- Convert
-
Model Training & Evaluation:
- Train multiple models using a Pipeline:
- Linear Regression
- Lasso (L1)
- Ridge (L2)
- Random Forest Regressor
- Evaluate the performance of each model using the RMSE loss function and R² score.
- Compare predicted values with actual values.
- Train multiple models using a Pipeline:
-
Save & Load the Models:
- Save the models to
.pkl
files inside themodels
directory. - Load the models and make predictions.
- Save the models to
pandas
: Data manipulation and analysis.numpy
: Numerical operations on arrays.scikit-learn
: Machine learning algorithms and tools.matplotlib
: Plotting and visualization.seaborn
: Data visualization.requests
: Sending HTTP requests (used for LocationIQ API).
To start using this project, follow the steps below:
-
Clone this repository to your local machine:
git clone git@github.com:IsmaelMousa/house-price-prediction.git
-
Navigate to the house-price-prediction directory:
cd house-price-prediction
-
Set up a virtual environment:
python3 -m venv .venv
-
Activate the virtual environment:
source .venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Jupyter Notebook:
jupyter-notebook
Important
For reverse geocoding, the LocationIQ API is used. If you wish to perform reverse geocoding, sign up here and replace the placeholder with your token:
key = "YOUR_LOCATIONIQ_API_KEY"
If you don't want to try the reverse geocoding, you can skip this step.
The results include performance metrics for each model and comparisons of predicted versus actual values.
Detailed analysis and model performance summaries are provided in the main.ipynb.
The dataset used in this project is available on Kaggle: House Prices.