This repository showcases the application of Google Cloud's Generative AI for product cataloging. The solution will:
- Suggest product category given a sparse description and (optionally) a product image
- Generate product attributes
- Generate detailed marketing copy for these products
These results are grounded using a customer provided product catalog, enabling more specific and relevant results than a purely generative approach would provide.
The repository consists of
-
Notebooks: Intended for educational purposes to demonstrate the solution methodology
-
Sample Code: The code has 100% test coverage, conforms to OpenAPI specifications, and can be used to accelerate deployment of a production solution.
-
Documentation: In addition to documentation in the source code, the repo contains step-by-step deployment guidance.
-
Example Frontend [coming soon]: Can be used for demonstration purposes or to accelerate deployment of a customer specific frontend
The notebooks listed below were developed to explain the concepts exposed in this repository:
- Exploratory Data Analysis (0_EDA_flipkart_dataset.ipynb): Introduces the sample dataset and formats for use in downstream notebooks.
- Generate Embeddings (1_generate_embeddings.ipynb): Use the Vertex Embedding API to generate multimodal embeddings based on product description and image.
- Create Vector DB (2_create_vector_db.ipynb): Store embeddings in low latency Vector DB powered by Vertex Vector Search.
- Product Categorization (3_product_categorization.ipynb): Demonstrate RAG approach to categorizing new products.
- Attribute Generation (4_attribute_generation.ipynb): Demonstrate RAG approach to generating product attributes.
- Marketing Desription (5_marketing_description.ipynb): Utilize Vertex LLM to generate marketing copy for products.
Prior to deploying you must:
- Create Product Reference Table in BigQuery
The table should conform to the following schema, although the column names can differ as we will provide a name mapping mechanism in config.py
. The notebook 0_EDA_flipkart_dataset.ipynb demonstrates the creation of a solution compatible table, feel free to use it as reference.
Field | Description | Required? | Type |
---|---|---|---|
id | Unique product ID | yes | string |
description | Prose description | yes | string |
category_l1 | Top level (L1) category | yes | string |
category_l2 | L2 category | no | string |
category_l3 | L3 category | no | string |
category_… | And so on for the remaining category levels… | no | string |
attributes | List of attributes key value pairs e.g {color: ‘green’, material: ‘polyester’} | yes | JSON |
main_image | gcs uri of product image to be used for embedding | yes | string |
last_updated | When this row was created or last modified | yes | timestamp |
- Create Embeddings and Vector DB (1_generate_embeddings.ipynb and 2_create_vector_db.ipynb)
- Update configuration variables in /backend/config.py
THEN
cd backend
pip install uvicorn
uvicorn api:app
The code is dockerized so you can deploy on any docker compatible infrastructure including GKE, Cloud Run, or App Engine. Below are instructions fo deploying to App Engine:
cd backend
Open the app.yaml
configuration file and include your service account:
service_account: <REPLACE WITH YOUR SERVICE ACCOUNT ADDRESS>
The service account will require the following IAM roles:
- Vertex AI Service Agent
- BigQuery Data Viewer
- App Engine Deployer
- Logs Writer
- Storage Object Viewer
Deploy the solution to AppEngine
gcloud app deploy
Wait for the application to be deployed and open the link generated by AppEngine.
Once you've deployed the backend (either locally or to cloud), browse to http://<deployment-address-here>/docs
for full documentation on how to call the API.
If you have any questions or if you found any problems with this repository, please report through GitHub issues.