Serverless Semantic Video Search Using a Vector Database and a Multi-Modal Generative AI Embeddings Model
You can find the related blogpost to this repository here: Implement serverless semantic search of image and live video with Amazon Titan Multimodal Embeddings!
Deploying the infrastructure requires you to have sufficient AWS privileges to do so.
Warning
This example is for experimental purposes only and is not production ready. The deployment of this sample can incur costs. Please ensure to remove infrastructure via the provided scripts when not needed anymore.
- AWS Account Prerequisites
- Deploy to Amplify
- Local Development Prerequisites
- Local Build
- Clean Up
- Usage Instructions
- Solution Walkthrough
- Enabled Model Access for Amazon Bedrock
Titan Multimodal Embeddings G1
using instructions
- Click the button above to deploy this solution with default parameters directly in your AWS account or use the Amplify Console to setup Github Access.
- In the select service role section, create a new service role and see Amplify Service Role for required permissions used for the deployment role
Caution
We advise you to restrict access to branches using a username and password to limit resource consumption by unintended users by following this guide.
- Add a SPA Redirect
- Attach AdministratorAccess rather than AdministratorAccess-Amplify
- Optional: You can use AdministratorAccess-Amplify but add a new IAM policy with additional required permissions which may include:
- "aoss:BatchGetCollection"
- "aoss:CreateAccessPolicy"
- "aoss:CreateCollection"
- "aoss:GetSecurityPolicy"
- "aoss:CreateSecurityPolicy"
- "aoss:DeleteSecurityPolicy"
- "aoss:DeleteCollection"
- "aoss:DeleteAccessPolicy"
- "aoss:TagResource"
- "aoss:UntagResource"
- "kms:Decrypt"
- "kms:Encrypt"
- "kms:DescribeKey"
- "kms:CreateGrant"
- Optional: You can use AdministratorAccess-Amplify but add a new IAM policy with additional required permissions which may include:
- AWS CLI
- python 3.11
- pip 24.0 or higher
- virtualenv 20.25.0 or higher
- node v20.10.0 or higher
- npm 10.5.0 or higher
- amplify CLI 12.10.1 or higher
- Use us-east-1 for deployment region
- See Amplify Service Role for required permissions used for the deployment role
amplify init
npm ci
amplify push
npm run dev
Important
We advise you to run the application in a sandbox account and deploy the frontend locally.
Caution
Using the Cloud hosted frontend with the default cognito settings of allowing any user to create and confirm an account will allow any user with knowledge of the deployed URL to upload images/video which has the potential to incur unexpected charges in your AWS account. You can implement a human review of new sign-up requests in cognito by following instructions in the Cognito Developer Guide for Allowing users to sign up in your app but confirming them as a user pool administrator
Follow the instructions to create a redirect for single page web apps (SPA)
- Full Cleanup Instructions
amplify delete
for local build
-
Use the
Sign In
button to log in. Use theCreate Account
tab located at the top of the website to sign up for a new user account with your Amazon Cognito integration. -
After successfully signing in, choose from the left sidebar to upload an image or video:
- Click on
Choose files
Button - Select the images or videos from your local drive
- Click on
Upload Files
- Click
Allow
when your browser asks for permissions to access your webcam - Click
Capture Image
andUpload Image
when you want to upload a single image from your webcam - Click
Start Video Capture
,Stop Video Capture
and finallyUpload Video
to upload a video from your webcam
- Type your prompt in the
Search Videos
text field. Depending on your input in previous steps you can prompt i.e.“Show me a person with glasses”
- Lower the
confidence parameter
closer to 0, if you see fewer results than you were originally expecting
Tip
The confidence is not a linear scale from 0 to 100. This confidence represents the vector distance between the user's query and the image in the database where 0 represents completely opposite vectors and 100 represents the same vector datapoint.
Raw Solution Architecture Diagram
- Amazon Opensearch Serverless
- Amazon Bedrock
- AWS Lambda
- AWS S3
- Amazon Cognito
- AWS Elemental MediaConvert
- AWS Amplify [Deploying and hosting frontend and backend]
- Amazon Cloudfront [optional when using cloud hosted front-end]
- User manually uploads video clips to S3 bucket (console, CLI or SDK).
- S3 Bucket that holds video clips trigger an (s3:ObjectCreated) event for each clip (mp4 or webm) stored in S3.
- Lambda function is subscribed to S3 Bucket (s3:ObjectCreated) event and queues up a MediaConvert job to convert the video clip into JPEG images.
- Converted images are saved by MediaConvert into an S3 bucket.
- S3 Bucket triggers an (s3:ObjectCreated) event for each image (JPEG) stored in S3.
- Lambda function is subscribed to the (s3:ObjectCreated) event and generates an embedding using Amazon Titan Multimodal Embeddings, for every new image (JPEG) stored in the S3 Bucket.
- Lambda function stores the embeddings in an OpenSearch Serverless index.
- Alternatively, video clips can be ingested from a video source into a Kinesis Video Data Stream.
- Kinesis Video Stream saves the video stream into video clips on the S3 Bucket. This triggers the same above path for steps 2-7.
- Use browses the website.
- CloudFront CDN fetches the static web files in S3.
- User authenticates and get token from Cognito User Pool.
- User makes a search requests to the website, passing the request to the API Gateway.
- API Gateway forwards the request to a Lambda Function.
- Lambda function passes the search query to Amazon Titan Multimodal Embeddings and converts the request into an embedding.
- Lambda function passes the embedding as part of the search, OpenSearch returns matching embeddings and Lambda function returns the matching images to the user.
While this solution doesn't create or manage a kinesis video stream, the website does include functionality for displaying a live kinesis video stream and replaying video clips from a kinesis video stream when an image is selected for self-managed kinesis video streams.
You can turn on this functionality by setting the kinesisVideoStreamIntegration parameter in the frontend cloudformation template to True and setting KINESIS_VIDEO_STREAM_INTEGRATION to true in vite.config.js
Warning
Making all the changes below does not guarantee a production ready environment. Before using this solution in production, you should carefully review all the resources deployed and their associated configuration to ensure it meets all of your organization's AWS Well Architected Framework requirements.
- Opensearch configuration
- AWS S3 configuration
- Lambda configuration
- IAM configuration
- Cognito configuration
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.