Customer Journey Analytics

Do you want to increase revenue and marketing ROI from your e-commerce platform? If yes, this is the project for you, so read further.

Project Motivation
Results
Installation
File Descriptions
Licensing, Authors, and Acknowledgements

Project Motivation

Customer journey mapping has become very popular in recent years. Making it right can upgrade marketing strategy, boost personalized branding and offerings and result in increased revenue and marketing ROI. To bring value to the business, it requires a healthy balance of qualitative knowledge of customer-facing functions about market and customers and quantitative insights, which can be gained from an e-commerce platform, CRM and other market related external sources.

The purpose of this project is to share with fellow sales, marketing professionals and data scientists how to approach quantitative part of customer journey mapping, namely answer questions:

How many buyer personas do we have?
What are their unique characteristics?
How accurately can we predict buyer persona from the first customer purchase transaction?
How can we adapt the marketing strategy concerning buyer personas to increase ROI?

From a data science perspective it means:

How to use hierarchical and non-hierarchical clustering to identify buyer personas?
How to use ensemble and linear-based models to profile buyer personas characteristics?

Results

The main findings of the code can be found at the post available here.

Installation

There are several necessary 3rd party libraries beyond the Anaconda distribution of Python which needs to be installed and imported to run code. These are:

scikit_posthocs providing posthoc tests for multiple comparison
google cloud SDK providing access to BigQuerry and Google Analytics Sample Dataset

File Descriptions

There is 1 notebook available here to showcase work related to the above questions. Markdown cells were used to assist in walking through the thought process for individual steps.

There are additional files:

bigquery_.py provides custom classes of BigqueryTable and BiqqueryDataset to query data to Google Merchandise Store sample dataset.
helper_py provides custom functions for various analyses, to keep notebook manageable to read.
custom_pca.py holds adaptation of scikit-learn PCA class including Varimax Rotation and Latent Root Criterion
google_analytics_schema.xlsx contains an analysis of variables in Big Query Export Schema used as a schema for Google Analytics Sample.
product_categories.xlsx ensures encoding of broken product category variables in the dataset
temp.data.h5 stores codes/levels of each variable in the dataset

Licensing, Authors, Acknowledgements

Must give credit to Google for the data. @alexisbcook for a nice introduction to Nested and Repeated Data. Daqing Chen, Sai Laing Sain & Kun Guo for their technical article Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining.