Product Catalog Enrichment

This repository showcases the application of Google Cloud's Generative AI for product cataloging. The solution will:

Suggest product category given a sparse description and (optionally) a product image
Generate product attributes
Generate detailed marketing copy for these products

These results are grounded using a customer provided product catalog, enabling more specific and relevant results than a purely generative approach would provide.

The repository consists of

Notebooks: Intended for educational purposes to demonstrate the solution methodology
Sample Code: The code has 100% test coverage, conforms to OpenAPI specifications, and can be used to accelerate deployment of a production solution.
Documentation: In addition to documentation in the source code, the repo contains step-by-step deployment guidance.
Example Frontend [coming soon]: Can be used for demonstration purposes or to accelerate deployment of a customer specific frontend

Notebooks

The notebooks listed below were developed to explain the concepts exposed in this repository:

Exploratory Data Analysis (0_EDA_flipkart_dataset.ipynb): Introduces the sample dataset and formats for use in downstream notebooks.
Generate Embeddings (1_generate_embeddings.ipynb): Use the Vertex Embedding API to generate multimodal embeddings based on product description and image.
Create Vector DB (2_create_vector_db.ipynb): Store embeddings in low latency Vector DB powered by Vertex Vector Search.
Product Categorization (3_product_categorization.ipynb): Demonstrate RAG approach to categorizing new products.
Attribute Generation (4_attribute_generation.ipynb): Demonstrate RAG approach to generating product attributes.
Marketing Desription (5_marketing_description.ipynb): Utilize Vertex LLM to generate marketing copy for products.

REST API Deployment

Prior to deploying you must:

Create Product Reference Table in BigQuery

The table should conform to the following schema, although the column names can differ as we will provide a name mapping mechanism in config.py. The notebook 0_EDA_flipkart_dataset.ipynb demonstrates the creation of a solution compatible table, feel free to use it as reference.

Field	Description	Required?	Type
id	Unique product ID	yes	string
description	Prose description	yes	string
category_l1	Top level (L1) category	yes	string
category_l2	L2 category	no	string
category_l3	L3 category	no	string
category_…	And so on for the remaining category levels…	no	string
attributes	List of attributes key value pairs e.g {color: ‘green’, material: ‘polyester’}	yes	JSON
main_image	gcs uri of product image to be used for embedding	yes	string
last_updated	When this row was created or last modified	yes	timestamp

Create Embeddings and Vector DB (1_generate_embeddings.ipynb and 2_create_vector_db.ipynb)
Update configuration variables in /backend/config.py

THEN

Local Deploy

cd backend
pip install uvicorn
uvicorn api:app

Cloud Deploy

The code is dockerized so you can deploy on any docker compatible infrastructure including GKE, Cloud Run, or App Engine. Below are instructions fo deploying to App Engine:

cd backend

Open the app.yaml configuration file and include your service account:

service_account: <REPLACE WITH YOUR SERVICE ACCOUNT ADDRESS>

The service account will require the following IAM roles:

Vertex AI Service Agent
BigQuery Data Viewer
App Engine Deployer
Logs Writer
Storage Object Viewer

Deploy the solution to AppEngine

gcloud app deploy

Wait for the application to be deployed and open the link generated by AppEngine.

REST API Docs

Once you've deployed the backend (either locally or to cloud), browse to http://<deployment-address-here>/docs for full documentation on how to call the API.

Getting help

If you have any questions or if you found any problems with this repository, please report through GitHub issues.