scaling deep retrieval workloads using NVIDIA's Merlin & NVTabular on Google Cloud's Vertex AI platform
Vertex AI is Google Cloud's unified Machine Learning platform to help data scientists and machine learning engineers increase experimentation, deploy faster, and manage models with confidence.
NVIDIA Merlin is an open-source framework for building large-scale deep learning recommender system.
See this repo for a sample development workflow
- launch Vertex pipeline to orchestrate GPU-based data preprocessing with NVTabular
- build two-tower encoder with Merlin Models,
- prepare training application (container/image) with Cloud Build
- scale training with Vertex AI and A100 GPU
- prepare serving application (container/image) with Cloud Build
- deploy trained query tower to Vertex AI Prediction endpoint
- 03b-build-docker.ipynb - (optionally) prepare serving application with docker
- Use candidate embeddings generated from Vertex Train job to create Matching Engine serving index
- create and deploy ANN and brute-force indexes
- compute recall and retrieval latency
- orchestrate e2e model training and deployemnt (notebooks 02 and 03)
- create candidate index and deploy to index endpoint with Vertex AI Matching Engine
- using trained towers and deployed Matching Engine index, generate playlist recommendations for your own (or any public) Spotify playlist(s)
The Python modules are in the src
folder:
src/preprocessor
- data preprocessing utility functions and classessrc/process_pipes
- vertex pipeline components for orchestrating data preprocessingsrc/serving/app
- deployment and serving utility functions and classessrc/train_pipes
- vertex pipeline components for orchestrating the training and deployment pipelinesrc/trainer
- model definitions and training application
- Operationalize large scale data preprocessing pipelines with NVTabular and Vertex AI Pipelines
- Scaling model training with Vertex AI Training
- Building and deploying a scalable approximate nearest neighbor (ANN) index using Vertex AI Matching Engine
Deploying trained models and serving predictions with Vertex Prediction (Triton server coming soon)
- Serving deployment with Triton Inference Server on Vertex AI Prediction
- Add ranking model workflow and demonstrate end-to-end retrieval -> ranking deployment
Spotify's Million Playlist Dataset (MPD) - see here for downloading and preparing the dataset in BigQuery
Ching-Wei Chen, Paul Lamere, Markus Schedl, and Hamed Zamani. Recsys Challenge 2018: Automatic Music Playlist Continuation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys ’18), 2018.
- Create and save training and validation splits
- Store data split statistics and schema (NVTabular's
workflow
) - Transform data splits and prepare them for training and serving tasks
- Orchestrate these NVTabular pipelines with Vertex Managed Pipelines
- Scale pipeline processing tasks with single or multiple GPU configurations
With 4 Tesla T4 GPUs per processing component, pipeline processes our Spotify MPD in ~27 minutes
- Build custom containers for training and serving
- Train Merlin retrieval model
- Import Query and Candidate Towers to pipeline DAG
- Register and deploy Query Tower with Vertex AI
- Create Matching Engine indexes amd index endpoints
- Deploy indexes to index endpoints