/genai-product-catalog

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Product Catalog Enrichment

This repository showcases the application of Google Cloud's Generative AI for product cataloging. The solution will:

  • Suggest product category given a sparse description and (optionally) a product image
  • Generate product attributes
  • Generate detailed marketing copy for these products

These results are grounded using a customer provided product catalog, enabling more specific and relevant results than a purely generative approach would provide.

The repository consists of

  1. Notebooks: Intended for educational purposes to demonstrate the solution methodology

  2. Sample Code: The code has 100% test coverage, conforms to OpenAPI specifications, and can be used to accelerate deployment of a production solution.

  3. Documentation: In addition to documentation in the source code, the repo contains step-by-step deployment guidance.

  4. Example Frontend [coming soon]: Can be used for demonstration purposes or to accelerate deployment of a customer specific frontend

Notebooks

The notebooks listed below were developed to explain the concepts exposed in this repository:

  • Exploratory Data Analysis (0_EDA_flipkart_dataset.ipynb): Introduces the sample dataset and formats for use in downstream notebooks.
  • Generate Embeddings (1_generate_embeddings.ipynb): Use the Vertex Embedding API to generate multimodal embeddings based on product description and image.
  • Create Vector DB (2_create_vector_db.ipynb): Store embeddings in low latency Vector DB powered by Vertex Vector Search.
  • Product Categorization (3_product_categorization.ipynb): Demonstrate RAG approach to categorizing new products.
  • Attribute Generation (4_attribute_generation.ipynb): Demonstrate RAG approach to generating product attributes.
  • Marketing Desription (5_marketing_description.ipynb): Utilize Vertex LLM to generate marketing copy for products.

REST API Deployment

Prior to deploying you must:

  1. Create Product Reference Table in BigQuery

The table should conform to the following schema, although the column names can differ as we will provide a name mapping mechanism in config.py. The notebook 0_EDA_flipkart_dataset.ipynb demonstrates the creation of a solution compatible table, feel free to use it as reference.

Field Description Required? Type
id Unique product ID yes string
description Prose description yes string
category_l1 Top level (L1) category yes string
category_l2 L2 category no string
category_l3 L3 category no string
category_… And so on for the remaining category levels… no string
attributes List of attributes key value pairs e.g {color: ‘green’, material: ‘polyester’} yes JSON
main_image gcs uri of product image to be used for embedding yes string
last_updated When this row was created or last modified yes timestamp
  1. Create Embeddings and Vector DB (1_generate_embeddings.ipynb and 2_create_vector_db.ipynb)
  2. Update configuration variables in /backend/config.py

THEN

Local Deploy

cd backend
pip install uvicorn
uvicorn api:app

Cloud Deploy

The code is dockerized so you can deploy on any docker compatible infrastructure including GKE, Cloud Run, or App Engine. Below are instructions fo deploying to App Engine:

cd backend

Open the app.yaml configuration file and include your service account:

service_account: <REPLACE WITH YOUR SERVICE ACCOUNT ADDRESS>

The service account will require the following IAM roles:

  • Vertex AI Service Agent
  • BigQuery Data Viewer
  • App Engine Deployer
  • Logs Writer
  • Storage Object Viewer

Deploy the solution to AppEngine

gcloud app deploy

Wait for the application to be deployed and open the link generated by AppEngine.

REST API Docs

Once you've deployed the backend (either locally or to cloud), browse to http://<deployment-address-here>/docs for full documentation on how to call the API.

Getting help

If you have any questions or if you found any problems with this repository, please report through GitHub issues.