End-to-End Machine Learning: Boston House Price Prediction

This project is a practical demonstration of a complete machine learning pipeline, focusing on predicting house prices in Boston using the renowned Boston Housing dataset. It covers data preprocessing, exploratory analysis, model training, evaluation, and deployment as a user-friendly web application using Flask.

Project Structure
Key Features
Project Stages
Getting Started
Future Enhancements

Project Structure

├── app.py         # Flask app for model deployment
├── model.ipynb    # Jupyter notebook: data analysis, preprocessing, training & evaluation
├── regmodel.pkl    # Saved trained machine learning model
├── scaler.pkl     # Saved data scaler for consistent input transformation
├── requirements.txt # Project dependencies
├── Dockerfile     # Instructions for building a Docker image
├── Procfile       # Process management for deployment (e.g., on Heroku)
└── templates/     # HTML templates for the web app
    └── home.html   # Main page for user interaction

Key Features

End-to-end ML pipeline: Covers all stages from raw data to deployed model.
Missing value imputation: Employs KNN and MICE imputation to handle missing data effectively.
Exploratory Data Analysis (EDA): Uses visualizations and correlation analysis to gain insights.
Linear Regression model: A simple yet powerful model for predicting housing prices.
Model evaluation: Employs various metrics (MAE, MSE, RMSE, R-squared) for robust assessment.
Flask web app: Provides an interactive platform for users to input data and get predictions.
Dockerization: Simplifies deployment and ensures environment consistency.

Project Stages

1. Data Analysis and Preprocessing (model.ipynb)

Data Loading: Retrieves the Boston Housing dataset from a reliable source.
Exploratory Data Analysis (EDA):
- Visualizes feature distributions using histograms, scatter plots, etc.
- Calculates correlation between features and target variable to understand relationships.
- Identifies potential multicollinearity (high correlation between independent features).
Missing Value Handling:
- Utilizes KNN Imputation to fill missing values based on the nearest neighbors.
- Applies MICE (Multivariate Imputation by Chained Equation) for robust imputation.
Data Splitting: Divides the data into training and testing sets for model building and validation.
Feature Scaling: Standardizes features using StandardScaler to ensure consistent scaling for the model.

2. Model Training and Evaluation (model.ipynb)

Model Selection: A Linear Regression model is chosen for its simplicity and interpretability.
Model Training: The model learns patterns from the training data to predict house prices.
Model Prediction: Predictions are made on the unseen test data to evaluate performance.
Performance Metrics:
- Mean Absolute Error (MAE): Average absolute difference between predicted and actual values.
- Mean Squared Error (MSE): Average squared difference between predicted and actual values.
- Root Mean Squared Error (RMSE): Square root of MSE, providing a more interpretable scale.
- R-squared (R²): Proportion of variance in the target variable explained by the model.
- Adjusted R-squared: R² adjusted for the number of predictors, penalizing model complexity.
Residual Analysis: Examines residuals (differences between predicted and actual values) to assess model assumptions.

3. Model Deployment with Flask (app.py)

Flask App Creation: A simple Flask web application is created to serve the model.
Model Loading: The trained model and data scaler are loaded from pickle files for inference.
Routing:
- /: Renders the main HTML template (home.html) for user interaction.
- /predict: Handles POST requests, preprocesses input data, generates predictions, and sends them back to the user.
HTML Template (home.html):
- Creates a form for users to input feature values.
- Displays the model's prediction dynamically upon submission.

Getting Started

Prerequisites

Python 3.7+
pip (package installer for Python)

Installation

Clone the repository:

git clone https://github.com/your-username/End-to-End-Ml-Boston-house-pricing.git
cd End-to-End-Ml-Boston-house-pricing

Create a virtual environment (recommended):

python3 -m venv venv 
source venv/bin/activate

Install required libraries:
```
pip install -r requirements.txt
```

Running the Application

Run the Flask app:
```
flask run
```
Access the web application in your browser at http://127.0.0.1:5000/.

Future Enhancements

Experiment with More Models: Explore other regression algorithms (Ridge, Lasso, Random Forest) to potentially improve accuracy.
Feature Engineering: Engineer new features from existing ones to potentially enhance model performance.
Web App Enhancement:
- Improve the user interface and design for a more engaging user experience.
- Implement input validation to ensure data integrity.
Cloud Deployment: Deploy the application to a cloud platform (AWS, Heroku, GCP) for scalability and accessibility.
Model Monitoring: Implement mechanisms to monitor the model's performance over time and retrain as needed.

oelbourki/End-to-End-Ml-Boston-house-pricing