Instacart Market Basket Analysis by Gemy Pham
The retail & e-commerce sectors are experiencing dramatic changes as more and more Americans prefer online shopping. Common business problems that online retailers face including: how to implement more data-driven customer retention strategy, how to understand the customer preferences by figuring out who they are, what they want. At the same time, these online businesses are also obtaining millions of transactions data. Data scientists thus could leverage this big data to help business gain more values.
Instacart is a grocery ordering and delivery app, which allows you to select products through their app, and then personal shoppers review your order and do in the in-store shopping and delivery for you. In other words, Instacart delivers groceries from your favorite stores to your door. The company is expanding its platform to cover 90 millions US household in 2018. With millions of transactions in real time, Instacart’s problem is a representative of a problem I would like to work on as a data scientist: predict customer behaviors with large amount of data. This project will focus on:
- Which products a user would buy again, try for the first time, or add to their cart next during a session?
Instacart open-sourced 3 Million of their Instacart Orders. This data is also available on Kaggle: https://www.kaggle.com/c/instacart-market-basket-analysis/data
This anonymized dataset contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, Instacart provide between 4 and 100 of their orders, with the sequence of products purchased in each order. Data also provide the week and hour of day the order was placed, and a relative measure of time between orders.