merlin-on-vertex

scaling deep retrieval workloads using NVIDIA's Merlin & NVTabular on Google Cloud's Vertex AI platform

Vertex AI is Google Cloud's unified Machine Learning platform to help data scientists and machine learning engineers increase experimentation, deploy faster, and manage models with confidence.

NVIDIA Merlin is an open-source framework for building large-scale deep learning recommender system.

See this repo for a sample development workflow

Repo structure

01-data-preprocess-pipeline.ipynb

launch Vertex pipeline to orchestrate GPU-based data preprocessing with NVTabular

02-merlin-vertex-training.ipynb

build two-tower encoder with Merlin Models,

prepare training application (container/image) with Cloud Build

scale training with Vertex AI and A100 GPU

03a-query-model-inference.ipynb

prepare serving application (container/image) with Cloud Build

deploy trained query tower to Vertex AI Prediction endpoint

03b-build-docker.ipynb - (optionally) prepare serving application with docker

04-matching-engine.ipynb

Use candidate embeddings generated from Vertex Train job to create Matching Engine serving index

create and deploy ANN and brute-force indexes

compute recall and retrieval latency

05-train-deploy-pipeline.ipynb

orchestrate e2e model training and deployemnt (notebooks 02 and 03)

create candidate index and deploy to index endpoint with Vertex AI Matching Engine

06-recs-for-your-spotify.ipynb

using trained towers and deployed Matching Engine index, generate playlist recommendations for your own (or any public) Spotify playlist(s)

The Python modules are in the src folder:

src/preprocessor - data preprocessing utility functions and classes
src/process_pipes - vertex pipeline components for orchestrating data preprocessing
src/serving/app - deployment and serving utility functions and classes
src/train_pipes - vertex pipeline components for orchestrating the training and deployment pipeline
src/trainer - model definitions and training application

Objectives

Operationalize large scale data preprocessing pipelines with NVTabular and Vertex AI Pipelines
Scaling model training with Vertex AI Training
Building and deploying a scalable approximate nearest neighbor (ANN) index using Vertex AI Matching Engine

Deploying trained models and serving predictions with Vertex Prediction (Triton server coming soon)

TODOs

Serving deployment with Triton Inference Server on Vertex AI Prediction
Add ranking model workflow and demonstrate end-to-end retrieval -> ranking deployment

The dataset

Spotify's Million Playlist Dataset (MPD) - see here for downloading and preparing the dataset in BigQuery

Ching-Wei Chen, Paul Lamere, Markus Schedl, and Hamed Zamani. Recsys Challenge 2018: Automatic Music Playlist Continuation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys â€™18), 2018.

Data preprocessing pipeline

Create and save training and validation splits
Store data split statistics and schema (NVTabular's workflow)
Transform data splits and prepare them for training and serving tasks
Orchestrate these NVTabular pipelines with Vertex Managed Pipelines
Scale pipeline processing tasks with single or multiple GPU configurations

With 4 Tesla T4 GPUs per processing component, pipeline processes our Spotify MPD in ~27 minutes

Training -> Deployment pipeline

Build custom containers for training and serving
Train Merlin retrieval model
Import Query and Candidate Towers to pipeline DAG
Register and deploy Query Tower with Vertex AI
Create Matching Engine indexes amd index endpoints
Deploy indexes to index endpoints

tottenjordan/merlin-on-vertex