/mlops-zoomcamp-project-paris-price-house

MLOps Paris Housing Price Prediction.

Primary LanguageJupyter Notebook

MLOps Project Property Price Forecast in Paris

Deployment URL: https://vercel-app-mlops-zoomcamp-project-paris-price-house.vercel.app.

gif

About

This project focuses on predicting property prices in the urban environment of Paris based on various characteristics. The primary objective is to provide accurate price estimates for properties using machine learning models, aiding potential buyers, real estate agents, and developers in making informed decisions. The project is part of the MLOps Zoomcamp course by DataTalks.Club, designed to teach practical MLOps skills and methodologies for deploying and managing machine learning models at scale.

Project Overview

This project aims to predict house prices in the urban environment of Paris using a variety of features. The primary goal is to provide accurate property price estimations through machine learning models. This information will assist potential buyers, real estate agents, and developers in making well-informed decisions.

Captura de tela 2024-07-22 170053

Dataset Description

The dataset used in this project consists of synthetic data on house prices in Paris. It includes numeric attributes representing various features of the properties. This dataset is particularly valuable for educational purposes, practice, and gaining knowledge in machine learning and data analysis.

The dataset can be found on Kaggle: Paris Housing Price Prediction.

Context and Content do Dataset

This dataset is created from synthetic data representing house prices in the urban environment of Paris. It is ideal for educational purposes, practice, and gaining essential knowledge in machine learning and data analysis.

Next, the goal is to create a classification dataset from the existing data by adding a new column for the class attribute.

The dataset contains more than just rows and columns, it includes detailed house attributes listed as column names.

Description of Dataset Columns

All attributes are numeric variables and are listed below:

  • squareMeters: Total area of the house in square meters.

  • numberOfRooms: Total number of rooms in the house.

  • hasYard: Indicates whether the house has a yard (1 for yes, 0 for no).

  • hasPool: Indicates whether the house has a pool (1 for yes, 0 for no).

  • floors: Number of floors in the house.

  • cityCode: Zip code of the house's location.

  • cityPartRange: A range value indicating the exclusivity of the neighborhood; higher values denote more exclusive areas.

  • numPrevOwners: Number of previous owners the house has had.

  • made: Year the house was built.

  • isNewBuilt: Indicates whether the house is newly built (1 for yes, 0 for no).

  • hasStormProtector: Indicates whether the house has storm protection features (1 for yes, 0 for no).

  • basement: Area of the basement in square meters.

  • attic: Area of the attic in square meters.

  • garage: Size of the garage in square meters.

  • hasStorageRoom: Indicates whether the house has a storage room (1 for yes, 0 for no).

  • hasGuestRoom: Number of guest rooms in the house.

  • price: Predicted price value of the house.

    • Dataset Size: ParisHousing.csv (633.42 kB)
    • Tags: Real Estate; Hotels and Accommodations; Regression; Cities and Urban Areas; Housing; Linear Regression
    • Dataset available at Kaggle: Paris Housing Price Prediction

Prerequisites and Description of selected Frameworks and Tools

To successfully execute this MLOps project focused on predicting house prices in Paris, the following prerequisites and tools are required:

  • Git: Used for version control to manage and track changes in the project codebase, facilitating collaboration and continuous integration.
  • GitHub: Platforms for hosting Git repositories, enabling code collaboration, continuous integration (CI), and continuous deployment (CD).
  • Visual Studio Code: An integrated development environment (IDE) used for debugging and managing Python code and other files, providing a robust environment for code development.
  • Jupyter Notebook: An open-source web application used for data exploration, analysis, and visualization, allowing interactive development and testing of machine learning models.
  • PostgreSQL: A powerful, open-source relational database management system used for storing and retrieving structured data, ensuring data persistence and integrity.
  • Anaconda: A distribution of Python for scientific computing and data science, used for package management and creating virtual environments to manage dependencies.
  • Docker: A platform used for containerization, enabling the application to run consistently across different environments by packaging all dependencies and configurations into a container.
  • Flask: A lightweight web framework for Python, used for deploying the machine learning model as a web service, providing an API for interaction with the model.
  • Grafana: An open-source platform for monitoring and observability, used for visualizing and monitoring the performance of the prediction model, ensuring its reliability and efficiency.
  • MLflow: An open-source platform for managing the machine learning lifecycle, used for experiment tracking, model logging, and deployment, ensuring reproducibility and versioning of models.
  • Node.js: For building scalable network applications, useful for developing the front-end interface or microservices in the project.
  • Prefect: A workflow orchestration tool used for automating, scheduling, and monitoring data workflows, ensuring the smooth execution of data pipelines.
  • Pandas: A data manipulation and analysis library for Python, used for handling structured data and performing exploratory data analysis (EDA).
  • Scikit-learn: A machine learning library for Python, used for building, training, and evaluating the prediction model, providing various tools and algorithms for predictive modeling.
  • Matplotlib: A plotting library for Python, used for data visualization to create static, interactive, and animated plots, helping in data analysis and presentation.
  • Vercel: A cloud platform used for hosting and deploying web applications, ensuring the deployment and scalability of the project's web components.

Paris Price House App

This is a Next.js and Python based ML application for house price prediction in Paris, France.

Getting Started

Prerequisites

  • Node.js installed on your machine
  • Python installed on your machine

Installation

  1. Create a new Next.js app using the latest version of create-next-app:
npx create-next-app@latest paris-price-house
  1. Change into the newly created app directory:
cd paris-price-house
  1. Install Axios, a popular HTTP client library:
npm install axios

Running the App

  1. Run the Python script that fetches data from an external API:
python app.py
  1. Start the Next.js development server:
npx next

This will start the development server and make the app available at http://localhost:3000.

Running the Project MLflow

mlflow

  1. Start MLflow Server

Run the following command to start the MLflow server with a SQLite backend store and default artifact root:

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns --host 0.0.0.0

This will start the MLflow server, which will track experiments and store artifacts in the mlruns directory.

  1. Train Paris Housing Model

Run the following command to train the Paris Housing model:

python train_paris_housing_model

This will train a machine learning model on the Paris Housing dataset and log the experiment to MLflow.

  1. Deploy Paris Housing API

Run the following command to deploy the Paris Housing API:

python paris_housing_api

This will deploy a REST API that serves the trained model.

  1. Start Prefect Server

Run the following command to start the Prefect server:

python prefect server start

This will start the Prefect server, which will schedule and run the Prefect flow.

  1. Run Paris Housing Prefect Flow

Run the following command to run the Paris Housing Prefect flow:

python paris_housing_prefect_flow

This will schedule and run the Prefect flow, which will execute the trained model.

  1. Run Tests

Run the following command to run tests for the Paris Housing API:

python tests/test_paris_housing_api

This will run tests to ensure the API is functioning correctly.

prefect 2

prefect 1

Model Execution Result

model1

model2

model3

model4

mode5

model6

model7

mdel8

Monitoring and Observability with Grafana

grafana

Deployment vercel-app-mlops-zoomcamp-project-paris-price-house.vercel.app

Captura de tela 2024-07-22 182910

The project is deployed on Vercel, making it easily accessible for users. The deployment ensures that users can input property details through the frontend and get real-time price predictions.

How to Use app vercel Property Price Forecast in Paris

1. Fill in all fields with property information, enter information about the property in the input fields provided in the interface. Only positive numbers are allowed.

2. For fields where there is no information to be entered, they must be filled in with 0, indicating the absence.

3. If any field is not filled in, clicking the "Estimate Price" button will return the alert "Please fill in all fields to get a forecast!".

4. After filling in all the fields, click on the "Estimate Price" button to obtain the predicted price of the property.

5. View the results, the predicted price will be displayed on the screen, providing an estimate based on the input data. Example: "Forecasted Price: €557642.1".

6. Click on the "Clear Fields" button so that all fields are reset.

Importance of the Project

Predicting property prices accurately is crucial for various stakeholders in the real estate market. This project leverages machine learning to provide reliable price estimates, which can help:

  • Buyers: Make informed decisions about purchasing properties.
  • Real Estate Agents: Provide accurate price recommendations to clients.
  • Developers: Evaluate potential investments and project profitability. By using this project, users can gain insights into property pricing trends in Paris and make better financial decisions.

This project serves as an excellent example of applying MLOps principles to a real-world problem, demonstrating the integration of data science, machine learning, and operational processes to deliver valuable insights and solutions.

Project Best Practices

  • ✅ Problem description: The project is well described and it's clear and understandable.
  • ✅ Experiment tracking and model registry: Both experiment tracking and model registry are used.
  • ✅ Workflow orchestration: Fully deployed workflow.
  • ✅ Model deployment: The model deployment code is containerized and can be deployed to the cloud.
  • ✅ Model monitoring: Basic model monitoring that calculates and reports metrics.
  • ✅ Reproducibility: Instructions are clear, it's easy to run the code, and it works. The versions for all the dependencies are specified.
  • ✅ Visualization: Visualization of the practical project in Vercel.

Acknowledgments

This project was developed as part of the MLOps Zoomcamp course by DataTalks.Club. Special thanks to the course instructors and the community for their support and guidance.