Deploying a Multi-Model and Multi-RAG Powered Chatbot Using AWS CDK on AWS

Table of content

Features
Precautions
Prequirements
Deploy
Clean up
Authors
Credits
License

Features

Modular, comprehensive and ready to use

This solution provides ready-to-use code so you can start experimenting with a variety of Large Language Models and Multimodal Language Models, settings and prompts in your own AWS account.

Supported model providers:

Amazon Bedrock
Amazon SageMaker self-hosted models from Foundation, Jumpstart and HuggingFace.
Third-party providers via API such as Anthropic, Cohere, AI21 Labs, OpenAI, etc. See available langchain integrations for a comprehensive list.

Experiment with multimodal models

Deploy IDEFICS models on Amazon SageMaker and see how the chatbot can answer questions about images, describe visual content, generate text grounded in multiple images.

Currently, the following multimodal models are supported:

IDEFICS 9b Instruct
- Requires ml.g5.12xlarge instance.
IDEFICS 80b Instruct
- Requires ml.g5.48xlarge instance.

To have the right instance types and how to request them, read Amazon SageMaker requirements

NOTE: Make sure to review IDEFICS models license sections.

To deploy a multimodal model, follow the deploy instructions and select one of the supported models (press Space to select/deselect) from the magic-create CLI step and deploy as instructed in the above section.

⚠️ NOTE ⚠️ Amazon SageMaker are billed by the hour. Be aware of not letting this model run unused to avoid unnecessary costs.

Multi-Session Chat: evaluate multiple models at once

Send the same query to 2 to 4 separate models at once and see how each one responds based on its own learned history, context and access to the same powerful document retriever, so all requests can pull from the same up-to-date knowledge.

Experiment with multiple RAG options with Workspaces

A workspace is a logical namespace where you can upload files for indexing and storage in one of the vector databases. You can select the embeddings model and text-splitting configuration of your choice.

Unlock RAG potentials with Workspaces Debugging Tools

The solution comes with several debugging tools to help you debug RAG scenarios:

Run RAG queries without chatbot and analyse results, scores, etc.
Test different embeddings models directly in the UI
Test cross encoders and analyse distances from different functions between sentences.

Full-fledged User Interface

The repository includes a CDK construct to deploy a full-fledged UI built with React to interact with the deployed LLMs/MLMs as chatbots. Hosted on Amazon S3 and distributed with Amazon CloudFront.

Protected with Amazon Cognito Authentication to help you interact and experiment with multiple LLMs/MLMs, multiple RAG engines, conversational history support and document upload/progress.

The interface layer between the UI and backend is built with API Gateway REST API for management requests and Amazon API Gateway WebSocket APIs for chatbot messages and responses.

Design system provided by AWS Cloudscape Design System.

⚠️ Precautions ⚠️

Before you begin using the solution, there are certain precautions you must take into account:

Cost Management with self-hosted models on SageMaker: Be mindful of the costs associated with AWS resources, especially with SageMaker models billed by the hour. While the sample is designed to be cost-effective, leaving serverful resources running for extended periods or deploying numerous LLMs/MLMs can quickly lead to increased costs.
Licensing obligations: If you choose to use any datasets or models alongside the provided samples, ensure you check the LLM code and comply with all licensing obligations attached to them.
This is a sample: the code provided in this repository shouldn't be used for production workloads without further reviews and adaptation.

Amazon SageMaker requirements (for self-hosted models only)

Instance type quota increase

If you are looking to self-host models on Amazon SageMaker, you'll likely need to request an increase in service quota for specific SageMaker instance types, such as the ml.g5 instance type. This will give access to the latest generation of GPU/Multi-GPU instance types. You can do this from the AWS console

Amazon Bedrock requirements

Base Models Access

If you are looking to interact with models from Amazon Bedrock, you need to request access to the base models in one of the regions where Amazon Bedrock is available. Make sure to read and accept models' end-user license agreements or EULA.

Note:

You can deploy the solution to a different region from where you requested Base Model access.
While the Base Model access approval is instant, it might take several minutes to get access and see the list of models in the UI.

Third-party models requirements

You can also interact with external providers via their API, such as AI21 Labs, Cohere, OpenAI, etc.

The provider must be supported in the Model Interface, see available langchain integrations for a comprehensive list of providers.

Usually, an API_KEY is required to integrate with 3P models. To do so, the Model Interface deployes a Secrets in AWS Secrets Manager, intially with an empty JSON {}, where you can add your API KEYS for one or more providers.

These keys will be injected at runtime into the Lambda function Environment Variables; they won't be visible in the AWS Lambda Console.

For example, if you wish to be able to interact with AI21 Labs., OpenAI's and Cohere endpoints:

Open the Model Interface Keys Secret in Secrets Manager. You can find the secret name in the stack output, too.
Update the Secrets by adding a key to the JSON

{
  "AI21_API_KEY": "xxxxx",
  "OPENAI_API_KEY": "sk-xxxxxxxxxxxxxxx",
  "COHERE_API_KEY": "xxxxx",
}

N.B: In case of no keys needs, the secret value must be an empty JSON {}, NOT an empty string ''.

make sure that the environment variable matches what is expected by the framework in use, like Langchain (see available langchain integrations.

Deploy

Environment setup

Deploy with AWS Cloud9

We recommend deploying with AWS Cloud9. If you'd like to use Cloud9 to deploy the solution, you will need the following before proceeding:

select at least m5.large as Instance type.
use Ubuntu Server 22.04 LTS as the platform.

Deploy with Github Codespaces

If you'd like to use GitHub Codespaces to deploy the solution, you will need the following before proceeding:

An AWS account
An IAM User with:

AdministratorAccess policy granted to your user (for production, we recommend restricting access as needed)
Take note of Access key and Secret access key.

To get started, click on the button below.

Once in the Codespaces terminal, set up the AWS Credentials by running

aws configure

AWS Access Key ID [None]: <the access key from the IAM user generated above>
AWS Secret Access Key [None]: <the secret access key from the IAM user generated above>
Default region name: <the region you plan to deploy the solution to>
Default output format: json

You are all set for deployment; you can now jump to .3 of the deployment section below.

Local deployment

If you have decided not to use AWS Cloud9 or GitHub Codespaces, verify that your environment satisfies the following prerequisites:

You have:

An AWS account
AdministratorAccess policy granted to your AWS account (for production, we recommend restricting access as needed)
Both console and programmatic access
NodeJS 16 or 18 installed
- If you are using nvm you can run the following before proceeding
- ```
nvm install 16 && nvm use 16

or

nvm install 18 && nvm use 18
```
AWS CLI installed and configured to use with your AWS account
Typescript 3.8+ installed
AWS CDK CLI installed
Docker installed
- N.B. buildx is also required. For Windows and macOS buildx is included in Docker Desktop
Python 3+ installed

Deployment

Clone the repository

git clone https://github.com/aws-samples/aws-genai-llm-chatbot

Move into the cloned repository

cd aws-genai-llm-chatbot

(Optional) Only for Cloud9

If you use Cloud9, increase the instance's EBS volume to at least 100GB. To do this, run the following command from the Cloud9 terminal:

./scripts/cloud9-resize.sh

See the documentation for more details on environment resize.

3. Install the project dependencies and build the project by running this command

npm install && npm run build

Once done, run the magic-create CLI to help you set up the solution with the features you care most:

npm run create

You'll be prompted to configure the different aspects of the solution, such as:

The LLMs or MLMs to enable (we support all models provided by Bedrock along with SageMaker hosted Idefics, FalconLite, Mistral and more to come)
Setup of the RAG system: engine selection (i.e. Aurora w/ pgvector, OpenSearch, Kendra..) embeddings selection and more to come.

When done, answer Y to create a new configuration.

Your configuration is now stored under bin/config.json. You can re-run the magic-create command as needed to update your config.json

(Optional) Bootstrap AWS CDK on the target account and region

Note: This is required if you have never used AWS CDK on this account and region combination. (More information on CDK bootstrapping).

npx cdk bootstrap aws://{targetAccountId}/{targetRegion}

You can now deploy by running:

npx cdk deploy

Note: This step duration can vary greatly, depending on the Constructs you are deploying.

You can view the progress of your CDK deployment in the CloudFormation console in the selected region.

Once deployed, take note of the User Interface, User Pool and, if you want to interact with 3P models providers, the Secret that will, eventually, hold the various API_KEYS should you want to experiment with 3P providers.

...
Outputs:
GenAIChatBotStack.UserInterfaceUserInterfaceDomanNameXXXXXXXX = dxxxxxxxxxxxxx.cloudfront.net
GenAIChatBotStack.AuthenticationUserPoolLinkXXXXX = https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
GenAIChatBotStack.ApiKeysSecretNameXXXX = ApiKeysSecretName-xxxxxx
...

Open the generated Cognito User Pool Link from outputs above i.e. https://xxxxx.console.aws.amazon.com/cognito/v2/idp/user-pools/xxxxx_XXXXX/users?region=xxxxx
Add a user that will be used to log into the web interface.
Open the User Interface Url for the outputs above, i.e. dxxxxxxxxxxxxx.cloudfront.net
Login with the user created in .8; you will be asked to change the password.

Run user interface locally

See instructions in the README file of the lib/user-interface/react-app folder.

Clean up

You can remove the stacks and all the associated resources created in your AWS account by running the following command:

npx cdk destroy

Note: Depending on which resources have been deployed. Destroying the stack might take a while, up to 45m. If the deletion fails multiple times, please manually delete the remaining stack's ENIs; you can filter ENIs by VPC/Subnet/etc using the search bar here in the AWS console) and re-attempt a stack deletion.

Architecture

This repository comes with several reusable CDK constructs. Giving you the freedom to decide what to deploy and what not.

Here's an overview:

Authors

Credits

This sample was made possible thanks to the following libraries:

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Changelog of the project.
License of the project.
Code of Conduct of the project.
CONTRIBUTING for more information.

BraedenQ/confluent-kafka-rag-chatbot