/Kaggle-Brazilian-Ecommerce-Prediction

Predicting the delivery duration of orders from the Olist Brazilian Ecommerce Dataset

Primary LanguageJupyter Notebook

πŸ›οΈπŸŒ Predicting Delivery Times Using A Real, Commercial Brazilian E-Commerce Dataset πŸ‡§πŸ‡·πŸ’°

==============================

In this project, we develop a regression model to predict how long it'll take for an order to be delivered for using the Olist Ecommerce Dataset available on Kaggle.

Summary

For this project, we decided to push the limits of data processing.

Using the Polars library, in combination with Pandas, we were able to clean, transform, join, and reshape 8 files containing anywhere from [20 to 100K records] in less than 5-6 min.

Leveraging the Power of Z BY HP to make your data science work 🀯mindblowingly🀯 fast.

(And decreasing the amount of time staring at the screen waiting for your training job to finish.⏳)


Instructions

  1. Clone this Git repository using the following command: git clone https://github.com/MMBazel/Kaggle-Brazilian-Ecommerce-Prediction.git
  2. Using the terminal or command prompt, cd into Kaggle-Brazilian-Ecommerce-Prediction
  3. Check if you have Python installed on your machine by typing python --version in your terminal or command prompt window. If Python is not installed, download and install Python from the official website.
  4. Create a virtual environment for the project using the following command:
    • python -m venv <YOUR_ENV_NAME>
  5. Activate the virtual environment using the following command:
    • On Windows:
      • If you're using command line: <YOUR_ENV_NAME>\Scripts\activate.bat
      • If you're using PowerShell: <YOUR_ENV_NAME>\Scripts\Activate.ps1
    • On Linux/Mac: source <YOUR_ENV_NAME>/bin/activate
  6. Install the required packages using pip by running the following command:
    • pip install -r requirements.txt
  7. Manually (or programatically) unzip the different data sources in /data/raw/ and /data/backup/.
  8. Make sure you're in the root folder. In your terminal run the command:
    • python3 src/main_script.py
  9. If the program has run successfully, a streamlit dashboard should open in a browser window.
  10. When you're finished, hit Ctrl+C to exit streamlit.

Project Organization

β”œβ”€β”€ README.md          <- The top-level README for developers using this project.
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ raw       <- Data needed for the pipeline. Make sure everything has been unzipped. 
β”‚   └── backup            <- Backup files for the Streamlit dashboard. Makesure they're unzipped. 
β”‚
β”œβ”€β”€ models             <- Trained and serialized models, model predictions, or model summaries
β”‚
β”œβ”€β”€ notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
β”‚                         the creator's initials, and a short `-` delimited description, e.g.
β”‚                         `1.0-jqp-initial-data-exploration`.
β”‚
β”œβ”€β”€ requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
β”‚                         generated with `pip freeze > requirements.txt`
β”‚
β”œβ”€β”€ src                <- Source code for use in this project.
β”‚   β”‚
β”‚   └── main_script.py  <- The only script that needs to be run with `python3 src/main_script.py`.
β”‚                          Make sure you're in the right folder. 
β”œβ”€β”€ dashboard.py          <- Script for the Streamlit dashboard.

Running The Script

πŸ“Ή Video πŸ“Ή

Screenshots

plot plot


Playing With The Dashboard

πŸ“Ή Video πŸ“Ή

Screenshots

plot plot plot