/ny_rent_pricing

Rent pricing prediction on NY properties with interactive dashboards.

Primary LanguageJupyter Notebook

Jupyter Notebook Python 3.10.12 Linux

Rent Pricing

  • EDA on rent pricing at NY (New York) boroughs with interactive dashboards, along with development of a ML regression model.

  • If you want to see the deployed application, click down below and feel free to test the models with your own instances, interact with dynamic dashboards about the dataset or visualize static ones:

    • Deploy: Deploy

1. Requirements

2. Screens

In this section, you can see the interactive and static dashboards screens made with Streamlit, as well as the predictor GUI.

2.1. Price Predictor

Predictor

2.2. Interactive Dashboard

Interactive

2.3. Static Dashboard

Static

3. Execution

  1. Clone the repository

    git clone https://github.com/juliorodrigues07/ny_rent_pricing.git
    
  2. Enter the repository's directory

    cd ny_rent_pricing
    
  3. Create a virtual environment

    python3 -m venv .venv
    
  4. Activate the virtual environment

    source .venv/bin/activate
    
  5. Install the dependencies

    pip install -r requirements.txt
    

3.1. Predictor and Dashboards

  • You first need to be in the dashboards directory to run the commands.

    • With Streamlit:

      streamlit run 1_🏠_Home.py
      
    • With Dash Plotly (only dashboard):

      python3 dash_test.py
      

3.2. Data Mining

  • To visualize the notebooks online and run them (Google Colaboratory), click on the following links:

  • To run the notebooks locally, run the commands in the notebooks directory following the template: jupyter notebook <file_name>.ipynb.

    • EDA (Exploratory Data Analysis):

      jupyter notebook 1_eda.ipynb
      
    • Preprocessing:

      jupyter notebook 2_preprocessing.ipynb
      
    • Machine Leaning:

      jupyter notebook 3_ml_methods.ipynb
      
  • To run python scripts locally, you first need to be in the src directory and then run the command:

    python3 main.py
    

4. Project Structure

.
├── README.md                       # Project's documentation
├── requirements.txt                # File containing all the required dependencies to run the project
├── plots                           # Directory containing all the graph plots generated in EDA
├── assets                          # Directory containing images used in README.md and in the deployed app
├── notebooks                       # Directory containing project's jupyter notebooks
|   ├── 1_eda.ipynb
|   ├── 2_preprocessing.ipynb
|   └── 3_ml_methods.ipynb
├── dashboards                      # Directory containing the web application
|   ├── 1_🏠_Home.py                <- Main page with the price predictor
|   ├── pages                       # Child pages directory
|   |   ├── 2_📈_Interactive.py     <- Script responsible for generating the interactive dashboards
|   |   └── 3_📊_Static.py          <- Script responsible for generating the static dashboards
|   └── dash_test.py                <- Interactive and static dashboards made with Dash library
├── src                             # Directory containing all the python scripts for data mining
|   ├── main.py                     <- Main script for evaluating ML models
|   └── datamining                  # Directory containing scripts responsible for all KDD process
|       ├── data_visualization.py
|       ├── preprocessing.py
|       ├── ml_methods.py
|       └── __init__.py
├── datasets                        # Directory containing all used or generated datasets in the project
|   ├── pricing.csv                 <- Original dataset
|   ├── reduced.parquet             <- Result after applying memory optimizing techniques on the original dataset
|   ├── filled.parquet              <- Result after inputting missing values in the reduced.parquet dataset
|   ├── preprocessed.parquet        <- Result after applying preprocessing techniques on the filled.parquet dataset
|   └── feature_selected.parquet    <- Final result after applying feature selection on the preprocessed.parquet dataset
└── models                          # Directory containing all generated models in the project
    ├── lgbm_model.pkl              <- LightGBM algorithm fitted model
    ├── xgb_model.pkl               <- XGBoost algorithm fitted model
    └── histgb_model.pkl            <- HistGradientBoosting algorithm fitted model

5. Outro

  • To uninstall all dependencies, run the following command:

    pip uninstall -r requirements.txt -y
    
  • To deactivate the virtual environment, run the following command:

    deactivate
    
  • To delete the virtual environment, run the following command:

    rm -rf .venv