Product-Sequence-Predictor-and-Recommender

The objective of this project is to enhance the online retail experience of a user by providing recommendations by gaining insights into the user behaviour and predicting the user's next purchase using Association Rule Mining and Markov Chains.

Libraries Used

Library	Function
numpy	To manipulate the data
pandas	To manipulate the data
matplotlib	For data visualization
seaborn	For data visualization
mlxtend	For association rule mining
fastapi	For creating API
uvicorn	For model deployment
streamlit	To build the UI
requests	To connect the UI with the API

Data Source Used

This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. There is also a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.

This dataset was generously provided by Olist, the largest department store in Brazilian marketplaces. Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners.

After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.

Database Schema

Exploratory Data Analysis

Due to huge size of the database, each table is explored individually and the required pre-processing is performed with the help of charts and graphs.

Churn Analysis

After performing Churn Analysis on the database, we infer that 97% of the customers churn out of the system, i.e., they make only one purchase in the entire lifetime of their purchase history. However, since the dataset has 10,00,000 data entries, we still have 3,000 entries to analyze.

Web Application

The Home Screen of the Web Application initially consists of two rows: Login and Trending Products. The user is prompted to enter their Unique Customer ID and login to the site. Based on the purchase history of the customer, one of the three cases is triggered.

Case-1: Cold-Start

If a new user, who has no purchase history, logs in, then the Top 5 trending products are recommended. This is the same as the Trending Products on the Home Page.

Case-2: Sequence Chain

If the user has previously purchased from the store, and their purchase is in a Chain of Products obtained using Association Rule Mining (Antecendent->Consequent:Antecedent->Consequent::) and Markov Chains (P(Conequent|Antecedent)), then the remaining items in the chain after the previously purchased product are recommended. If there is no remaining product in the Chain, then the Cold-Start case is triggered.

Case-3: Cross-Category Sell

If the user has previously purchased from the store, and their purchase is not in the Chain, then the category of the previous purchase is considered as the Antecedent, and the next category for purchase is obtained using Association Rule Mining. The Top 5 products of the Consequent category are then recommended. If there is no category in the Consequent, then the Cold-Start case is triggered.

Conclusion & Future Work

The API works accurately, however, takes a lot of time to render the output owing to the large size of the dataset, time complexity of generating association rules and building the chain. When deployed on a larger scale, the working of the algorithm can be parallelized to improve the time efficiency and achieve micro-second response. Also, the database can be dynamically updated to store the entries of the new purchases made by the customer, which may lead to changes in the chain of products and cross-category sales.

khusheekapoor/Product-Sequence-Predictor-and-Recommender