/Product-Sequence-Predictor-and-Recommender

A Product Sequence Predictor and Recommender Application made as a part of the Machine Learning Lab Course in the curriculum of B. Tech. Data Science & Engineering at Manipal Institute of Technology.

Primary LanguageJupyter Notebook

Product-Sequence-Predictor-and-Recommender

The objective of this project is to enhance the online retail experience of a user by providing recommendations by gaining insights into the user behaviour and predicting the user's next purchase using Association Rule Mining and Markov Chains.

Libraries Used

Library Function
numpy To manipulate the data
pandas To manipulate the data
matplotlib For data visualization
seaborn For data visualization
mlxtend For association rule mining
fastapi For creating API
uvicorn For model deployment
streamlit To build the UI
requests To connect the UI with the API

This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. There is also a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.

This dataset was generously provided by Olist, the largest department store in Brazilian marketplaces. Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners.

After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.

Database Schema

Database Schema

Due to huge size of the database, each table is explored individually and the required pre-processing is performed with the help of charts and graphs.

Churn Analysis

After performing Churn Analysis on the database, we infer that 97% of the customers churn out of the system, i.e., they make only one purchase in the entire lifetime of their purchase history. However, since the dataset has 10,00,000 data entries, we still have 3,000 entries to analyze.

Web Application

The Home Screen of the Web Application initially consists of two rows: Login and Trending Products. The user is prompted to enter their Unique Customer ID and login to the site. Based on the purchase history of the customer, one of the three cases is triggered.

Home Page

Case-1: Cold-Start

If a new user, who has no purchase history, logs in, then the Top 5 trending products are recommended. This is the same as the Trending Products on the Home Page.

Cold Start

Case-2: Sequence Chain

If the user has previously purchased from the store, and their purchase is in a Chain of Products obtained using Association Rule Mining (Antecendent->Consequent:Antecedent->Consequent::) and Markov Chains (P(Conequent|Antecedent)), then the remaining items in the chain after the previously purchased product are recommended. If there is no remaining product in the Chain, then the Cold-Start case is triggered.

Sequence Chain

Case-3: Cross-Category Sell

If the user has previously purchased from the store, and their purchase is not in the Chain, then the category of the previous purchase is considered as the Antecedent, and the next category for purchase is obtained using Association Rule Mining. The Top 5 products of the Consequent category are then recommended. If there is no category in the Consequent, then the Cold-Start case is triggered.

Cross-Category Sell

Conclusion & Future Work

The API works accurately, however, takes a lot of time to render the output owing to the large size of the dataset, time complexity of generating association rules and building the chain. When deployed on a larger scale, the working of the algorithm can be parallelized to improve the time efficiency and achieve micro-second response. Also, the database can be dynamically updated to store the entries of the new purchases made by the customer, which may lead to changes in the chain of products and cross-category sales.