-
EDA on rent pricing at NY (New York) boroughs with interactive dashboards, along with development of a ML regression model.
-
If you want to see the deployed application, click down below and feel free to test the models with your own instances, interact with dynamic dashboards about the dataset or visualize static ones:
-
Python3 and pip package manager:
sudo apt install python3 python3-pip build-essential python3-dev
-
virtualenv tool:
pip install virtualenv
-
Libraries: pandas, scikit-learn, mlxtend, xgboost, lightgbm, Streamlit, Dash, Plotly express, Kaleido, Matplotlib, seaborn, numpy, WordCloud, Cerberus, joblib, gdown;
-
Environments: Jupyter.
In this section, you can see the interactive and static dashboards screens made with Streamlit, as well as the predictor GUI.
-
Clone the repository
git clone https://github.com/juliorodrigues07/ny_rent_pricing.git
-
Enter the repository's directory
cd ny_rent_pricing
-
Create a virtual environment
python3 -m venv .venv
-
Activate the virtual environment
source .venv/bin/activate
-
Install the dependencies
pip install -r requirements.txt
-
You first need to be in the dashboards directory to run the commands.
-
With Streamlit:
streamlit run 1_🏠_Home.py
-
With Dash Plotly (only dashboard):
python3 dash_test.py
-
-
To visualize the notebooks online and run them (Google Colaboratory), click on the following links:
-
To run the notebooks locally, run the commands in the notebooks directory following the template:
jupyter notebook <file_name>.ipynb
.-
EDA (Exploratory Data Analysis):
jupyter notebook 1_eda.ipynb
-
Preprocessing:
jupyter notebook 2_preprocessing.ipynb
-
Machine Leaning:
jupyter notebook 3_ml_methods.ipynb
-
-
To run python scripts locally, you first need to be in the src directory and then run the command:
python3 main.py
.
├── README.md # Project's documentation
├── requirements.txt # File containing all the required dependencies to run the project
├── plots # Directory containing all the graph plots generated in EDA
├── assets # Directory containing images used in README.md and in the deployed app
├── notebooks # Directory containing project's jupyter notebooks
| ├── 1_eda.ipynb
| ├── 2_preprocessing.ipynb
| └── 3_ml_methods.ipynb
├── dashboards # Directory containing the web application
| ├── 1_🏠_Home.py <- Main page with the price predictor
| ├── pages # Child pages directory
| | ├── 2_📈_Interactive.py <- Script responsible for generating the interactive dashboards
| | └── 3_📊_Static.py <- Script responsible for generating the static dashboards
| └── dash_test.py <- Interactive and static dashboards made with Dash library
├── src # Directory containing all the python scripts for data mining
| ├── main.py <- Main script for evaluating ML models
| └── datamining # Directory containing scripts responsible for all KDD process
| ├── data_visualization.py
| ├── preprocessing.py
| ├── ml_methods.py
| └── __init__.py
├── datasets # Directory containing all used or generated datasets in the project
| ├── pricing.csv <- Original dataset
| ├── reduced.parquet <- Result after applying memory optimizing techniques on the original dataset
| ├── filled.parquet <- Result after inputting missing values in the reduced.parquet dataset
| ├── preprocessed.parquet <- Result after applying preprocessing techniques on the filled.parquet dataset
| └── feature_selected.parquet <- Final result after applying feature selection on the preprocessed.parquet dataset
└── models # Directory containing all generated models in the project
├── lgbm_model.pkl <- LightGBM algorithm fitted model
├── xgb_model.pkl <- XGBoost algorithm fitted model
└── histgb_model.pkl <- HistGradientBoosting algorithm fitted model
-
To uninstall all dependencies, run the following command:
pip uninstall -r requirements.txt -y
-
To deactivate the virtual environment, run the following command:
deactivate
-
To delete the virtual environment, run the following command:
rm -rf .venv