/ds-zap-challenge

Real estate price inference model using dataset from www.zapimoveis.com.br

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Data Science Challenge - Grupo ZAP

Machine Learning model that predicts sales prices for apartments based on ads at www.zapimoveis.com.br.

The dataset was provided by Data Science Challenge by Grupo ZAP.

⚠ Disclaimer

This project shouldn't be used in production environment or for decision making without validating its results.

This project has no support lifecycle and has only learning purposes.


Project technologies

MLflow

Machine Learning experiments are tracked and models are saved using MLflow.

More information in docs/ml_model.md

FastAPI (via Docker)

The project has an API (powered by FastAPI) to be consumed by other applications.

More information in docs/api.md

Streamlit

The project has a Data App (powered by Streamlit) that facilitates interaction with the model and visualization of the documentation.

Open in Streamlit

More information in docs/data_app.md

Folders

  • api: API's code (powered by FastAPI).
  • app: Data App's code (powered by Streamlit).
  • data: Datasets (raw and processed).
  • docs: Documentation files.
  • ds_code: Project code and modeling notebook.
  • mlruns: Machine learning experiments (powered by MLFlow).
  • properties: Application properties.

Project strategy

The project was divided into 3 parts.

Processing - ds_code/processing

  • Download, extract and preprocess the datasets.
  • Provide scripts to be used for all steps.
  • Data visualization for the training dataset

Available in the data app.

Modeling - ds_code/modeling

  • Refining the dataset and training the model.

Experiments are tracked on MLflow.

We made a careful feature selection of the datasets (training and test). We include geographic data provided by IBGE Census 2010.

See more information in the project documentation.


Results

The results and some business answers are in docs/report.md or on the Data App.


Documentation

All documentation about this project is in .md (markdown) files in the docs

This documentation is also available on the Data App.


Project setup

All necessary packages are listed in requirements.txt.

To install them, run the command below in the project directory.

pip install -r requirements.txt