This repository contains a Terraform implementation of a simple Retrieval-Augmented Generation (RAG) use case using Amazon Titan V2 as the embedding model and Claude 3 as the text generation model, both on Amazon Bedrock. This sample follows the user journey described below:
- The user manually uploads a file to Amazon S3, such as a Microsoft Excel or PDF document. The supported file types can be found here.
- The content of the file is extracted and embedded into a knowledge database based on a serverless Amazon Aurora with PostgreSQL.
- When the user engages with the text generation model, it utilizes previously uploaded files to enhance the interaction through retrieval augmentation.
-
Whenever an object is created in the Amazon S3 bucket
bedrock-rag-template-<account_id>
, an Amazon S3 notification invokes the Amazon Lambda functiondata-ingestion-processor
. -
The Amazon Lambda function
data-ingestion-processor
is based on a Docker image stored in the Amazon ECR repositorybedrock-rag-template
. The function uses the LangChain S3FileLoader to read the file as a LangChain Document. Then, the LangChain RecursiveTextSplitter chunks each document, given aCHUNK_SIZE
and aCHUNK_OVERLAP
which depends on the max token size of the embedding model, the Amazon Titan Text Embedding V2. Next, the Lambda function invokes the embedding model on Amazon Bedrock to embed the chunks into numerical vector representations. Lastly, these vectors are stored in the Amazon Aurora PostgreSQL database. To access the Amazon Aurora database, the Lambda function first retrieves the username and password from Amazon Secrets Manager. -
On the Amazon SageMaker notebook instance
aws-sample-bedrock-rag-template
, the user can write a question prompt. The code invokes Claude 3 on Amazon Bedrock and provides the knowledge base information to the context of the prompt. As a result, Claude 3 answers using the information in the documents.
The Amazon Lambda function data-ingestion-processor
resides in a private subnet within the VPC and it is not allowed to send traffic to the public internet due its security group. As a result, the traffic to Amazon S3 and Amazon Bedrock is routed through the VPC endpoints only. Consequently, the traffic does not traverse the public internet, which reduces latency and adds an additional layer of security at the networking level.
All the resources and data are encrypted whenever applicable using the Amazon KMS Key with the alias aws-sample/bedrock-rag-template
.
While this sample can be deployed into any AWS Region, we recommend to use us-east-1
or us-west-1
due to the availability of foundation and embedding models in Amazon Bedrock at the time of publishing (see Model support by AWS Region for an updated list of Amazon Bedrock foundation model support in AWS Regions). See the section Next steps which provides pointers on how to use this solution with other AWS Regions.
To run this sample, make sure that you have an active AWS account and that you have access to a sufficiently strong IAM role in the Management console and in the CLI.
Enable model access for the required LLMs in the Amazon Bedrock Console of your AWS account. The following models are needed for this example:
amazon.titan-embed-text-v2:0
anthropic.claude-3-sonnet-20240229-v1:0
The following software tools are required in order to deploy this repository:
❯ terraform --version
Terraform v1.8.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v5.50.0
+ provider registry.terraform.io/hashicorp/external v2.3.3
+ provider registry.terraform.io/hashicorp/local v2.5.1
+ provider registry.terraform.io/hashicorp/null v3.2.2
❯ docker --version
Docker version 26.0.0, build 2ae903e86c
❯ poetry --version
Poetry (version 1.7.1)
This sections explains how to deploy the infrastructure and how to run the demo in a Jupyter notebook.
Warning: The following actions are going to cause costs in the deployed AWS Account.
To deploy this sample, put the credentials as environment variables or configure the cli directly.
To test whether setting the credentials was successfully run aws sts get-caller-identity
. The output should contain the ARN of the user or role that you are signed in as.
To deploy the entire infrastructure, run the following commands:
cd terraform
terraform init
terraform plan -var-file=commons.tfvars
terraform apply -var-file=commons.tfvars
The end to end demo is presented inside the Jupyter notebook. Follow the steps below to run the demo by yourself.
The infrastructure deployment provisions an Amazon SageMaker notebook instance inside the VPC and with the permissions to access the PostgreSQL Aurora database. Once the previous infrastructure deployment has succeeded, follow the subsequent steps to run the demo in a Jupyter notebook:
- Log into the AWS management console of the account where the infrastructure is deployed
- Open the SageMaker notebook instance
aws-sample-bedrock-rag-template
. - Move the rag_demo.ipynb Jupyter notebook onto the SageMaker notebook instance via drag & drop.
- Open the rag_demo.ipynb on the SageMaker notebook instance and choose the
conda_python3
kernel. - Run the cells of the notebook to run the demo.
The Jupyter notebook guides the reader through the following process:
- Installing requirements
- Embedding definition
- Database connection
- Data ingestion
- Retrieval augmented text generation
- Relevant document queries
To destroy the infrastructure run terraform destroy -var-file=commons.tfvars
.
Make sure that the dependencies in the pyproject.toml are aligned with the requirements of the Amazon Lambda data-ingestion-processor
.
Install the dependencies and active the virtual environment:
poetry lock
poetry install
poetry shell
python -m pytest .
There are two possible ways to deploy this stack to AWS Regions other than us-east-1
and us-west-1
. You can configure the deployment AWS Region in the commons.tfvars
file. For cross-region foundation model access, consider the following options:
- Traversing the public internet: if the traffic can traverse the public the public internet, add internet gateways to the VPC and adjust the security group assigned to the Amazon Lambda function
data-ingestion-processor
and the SageMaker notebook instance to allow outbound traffic to the public internet. - NOT traversing the public internet: deploy this sample to any AWS Region different from
us-east-1
orus-west-1
. Inus-east-1
orus-west-1
, create an additional VPC including a VPC endpoint forbedrock-runtime
. Then, peer the VPC using a VPC peering or a transit gateway to the application VPC. Lastly, when configuring thebedrock-runtime
boto3 client in any AWS Lambda function outside ofus-east-1
orus-west-1
, pass the private DNS name of the VPC endpoint forbedrock-runtime
inus-east-1
orus-west-1
asendpoint_url
to the boto3 client. For the VPC peering solution, one can leverage the module Terraform AWS VPC Peering.
This project is licensed under the MIT License - see the LICENSE
file for details.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.