Predicting Customer Lifetime Value with Kubeflow Pipelines

Overview

This repository maintains the source code for the Predicting Customer Lifetime Value with Kubeflow Pipelines technical guide.

The Predicting Customer Lifetime Value technical guide delivers automation of Customer Lifetime Value (CLV) modeling techniques described in the Predicting Customer Lifetime Value with AI Platform series of articles.

The primary goal of the guide is to demonstrate how to orchestrate two Machine Learning workflows:

The training and deployment of the Customer Lifetime Value predictive model.
Batch inference using the deployed Customer Lifetime Value predictive model.

The below diagram depicts the high level architecture:

The Kubeflow Pipelines services are hosted on Google Kubernetes Engine running on Google Cloud Platform. The training and inference pipelines access BigQuery, AutoML Tables services through a set of Kubeflow Pipelines components that wrap the respective Google Cloud APIs. The container images for the components utilized by the pipelines are managed in Container Registry.

Refer to README in the /pipelines folder of this repo for more details on the design and usage the training and deployment pipelines.

Refer to README in the /components/automl_tables folder of this repo for more details on the design and usage of the AutoML Tables components.

Installing Kubeflow Pipelines

The guide has been developed and tested on Kubeflow Pipelines on Google Cloud Platform Kubernetes Engine (GKE).

You can run the solution on a full Kubeflow installation or on a lightweight deployment that only includes core Kubeflow Pipelines services. The full Kubeflow installation can be provisioned following the Kubeflow on GCP guide. The lightweight Kubeflow Pipelines deployment can be performed using the automation script delivered as a part of the guide.

Refer to README in the /install folder of this repo for the detailed installation instructions.

Building and deploying

The building and deploying of the solution components has been automated using Cloud Build.

Refer to README in the /deploy folder of this repo for the detailed deployment instructions.

Running the pipelines

There are two ways to run the solution's pipelines:

Using Kubeflow Pipelines UI
Using KFP SDK

Refer to README in the /run folder of this repo for detailed instructions on how to trigger runs.

Repository structure

/pipelines The source code for two template KFP Pipelines:

The pipeline that automates CLV model training and deployment
The pipeline that automates batch inference

/components

The source code for the KFP components that wrap selected AutoML Tables APIs.

/install

Kubeflow Pipelines installation script

/deploy

Cloud Build configuration for automated building and deployment.

/run

Sample code demonstrating how to use the kfp.Client() programmatic interface to KFP services.

Acknowledgements

The sample dataset used in the solution accelrator is based on the publicly available Online Retail Data Set from the UCI Machine Learning Repository.

The original dataset was preprocessed to conform to the the schema used by the solution's pipelines and uploaded to a public GCP bucket as gs://clv-datasets/transactions/.