/Delivery-Delay-Prediction

COMS4995 AML Project

Primary LanguageJupyter Notebook

AML Final Project

Lukas Wang, Qi Meng, Gursifath Bhasin, Jonathan Benghiat, Siddhant Pravin Mahurkar

Background

The advent of e-commerce is closely intertwined with the end of the era of many brick-and-mortar stores, thanks to its ability to offer exponentially more products and experiences to users without physical limitations. However, a critical issue plaguing this business is delay in deliveries which leads to negative user experience. Companies that do not act to mitigate delivery problems often experience a decline in sales. We intend to perform a delivery delay prediction for an e-commerce dataset that can tremendously help a business gain insight into problems that may cause this delay by looking at trends and potentially mitigate the causes. We begin by exploring our data thoroughly, followed by selecting and applying ML techniques to the problem of predicting sales volume based on a fairly high dimensional dataset that spans multiple years. Finally, we evaluate the performance of our models & select the best one.

Explanations of the folders:

  • Notebook contains the entire workflow of this project ordered by the prefix of the file names.

  • util contains the commonly used functions / packages imported across all notebooks

  • Dataset folder includes the data downloaded from https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce.

  • Model folder contains the temporary files like the cleaned data that will be accessed to each model files, as well as the trained models themselves.

  • the Report folder saves the graphs, results and report.

NOTE: Files that are larger than 25MB were stored using large file storage (LFS). More details here: https://git-lfs.github.com/