- Introduction
- Prerequisites
- Target technology stack
- Deployment
- Useful CDK commands
- Code Structure
- Customize the chatbot with your own data
This GenAI ChatBot application was built with Amazon Bedrock, which includes KnowledgeBase, Agent, and additional AWS serverless GenAI solutions. The provided solution showcases a Chatbot that makes use of its understanding of EC2 instances and the pricing of EC2 instances. This chatbot functions as an illustration of the capabilities of Amazon Bedrock to convert natural language into Amazon Athena queries and to process and utilize complex data sets. Open source tools, such as LLamaIndex, are utilized to augment the system's capabilities for data processing and retrieval. The integration of several AWS resources is also emphasized in the solution. These resources consist of Amazon S3 for storage, Amazon Bedrock KnowledgeBase to facilitate retrieval augmented generation (RAG), Amazon Bedrock agent to execute multi-step tasks across data sources, AWS Glue to prepare data, Amazon Athena to execute efficient queries, Amazon Lambda to manage containers, and Amazon ECS to oversee containers. The combined utilization of these resources empowers the Chatbot to efficiently retrieve and administer content from databases and documents, thereby demonstrating the capabilities of Amazon Bedrock in the development of advanced Chatbot applications.
- Docker
- AWS CDK Toolkit 2.114.1+, installed installed and configured. For more information, see Getting started with the AWS CDK in the AWS CDK documentation.
- Python 3.11+, installed and configured. For more information, see Beginners Guide/Download in the Python documentation.
- An active AWS account
- An AWS account bootstrapped by using AWS CDK in us-east-1 or us-west-2. Enable Claude model and Titan Embedding model access in Bedrock service.
- Amazon Bedrock
- Amazon OpenSearch Serverless
- Amazon ECS
- AWS Glue
- AWS Lambda
- Amazon S3
- Amazon Athena
- Elastic Load Balancer
To run the app locally, first add a .env file to 'code/streamlit-app' folder containing the following
ACCOUNT_ID = <Your account ID>
AWS_REGION = <Your region>
LAMBDA_FUNCTION_NAME = invokeAgentLambda # Sets name of choice for the lambda function called by streamlit for a response. Currently invokes an agent.
The cdk.json
file tells the CDK Toolkit how to execute your app.
This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the .venv
directory. To create the virtualenv it assumes that there is a python3
(or python
for Windows) executable in your path with access to the venv
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.
To manually create a virtualenv on MacOS and Linux:
$ python3 -m venv .venv
After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.
$ source .venv/bin/activate
If you are a Windows platform, you would activate the virtualenv like this:
% .venv\Scripts\activate.bat
Once the virtualenv is activated, you can install the required dependencies.
$ pip install -r requirements.txt
To add additional dependencies, for example other CDK libraries, just add
them to your setup.py
file and rerun the pip install -r requirements.txt
command.
At this point you can now synthesize the CloudFormation template for this code.
$ cdk synth
To add additional dependencies, for example other CDK libraries, just add
them to your setup.py
file and rerun the pip install -r requirements.txt
command.
You will need to bootstrap it if this is your first time running cdk at a particular account and region.
$ cdk bootstrap
Once it's bootstrapped, you can proceed to deploy cdk.
$ cdk deploy
If this is your first time deploying it, the process may take approximately 30-45 minutes to build several Docker images in ECS (Amazon Elastic Container Service). Please be patient until it's completed. Afterward, it will start deploying the chatbot-stack, which typically takes about 5-8 minutes.
Once the deployment process is complete, you will see the output of the cdk in the terminal, and you can also verify the status in your CloudFormation console.
You can either test the agent in AWS console or through streamlit app url listed in the outputs of chatbot-stack in CloudFormation.
To delete the cdk once you have finished using it to avoid future costs, you can either delete it through the console or execute the following command in the terminal.
$ cdk destroy
You may also need to manually delete the S3 bucket generated by the cdk. Please ensure to delete all the generated resources to avoid incurring costs.
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentationcdk destroy
dstroys one or more specified stacks
code # Root folder for code for this solution
├── lambdas # Root folder for all lambda functions
│ ├── action-lambda # Lambda function that acts as an action for the Amazon Bedrock Agent
│ ├── create-index-lambda # Lambda function that create Amazon Opensearch serverless index as Amazon Bedrock Knowlege base's vector database
│ ├── invoke-lambda # Lambda function that invokes Amazon Bedrock Agent, which is called diretly from the streamlit app
│ └── update-lambda # Lambda function that update/delete resources after AWS resources deployed via AWS CDK.
├── layers # Root folder for all lambda layers
│ ├── boto3_layer # Boto3 layer that is shared across all lambdas
│ └── opensearch_layer # opensearh layer that installs all dependencies for create Amazon Opensearch serverless index.
├── streamlit-app # Steamlit app that interacts with the Amazon Bedrock Agent
└── code_stack.py # Amazon CDK stack that deploys all AWS resources
To integrate your custom data for deploying the solution, please follow these structured guidelines tailored to your requirements:
- Locate the
assets/knowledgebase_data_source/
directory. - Place your dataset within this folder.
- Access the
cdk.json
file. - Navigate to the
context/configure/paths/knowledgebase_file_name
field and update it accordingly. - Further, modify the
bedrock_instructions/knowledgebase_instruction
field in thecdk.json
file to accurately reflect the nuances and context of your new dataset.
- Within the
assets/data_query_data_source/
directory, create a subdirectory, for example, tabular_data. - Deposit your structured dataset (acceptable formats include CSV, JSON, ORC, and Parquet) into this newly created subfolder.
- If you are connecting to your existing database, update the function
create_sql_engine()
incode/lambda/action-lambda/build_query_engine.py
to connect to your database.
- Update the
cdk.json
file'scontext/configure/paths/athena_table_data_prefix
field to align with the new data path. - Revise
code/lambda/action-lambda/dynamic_examples.csv
by incorporating new text to SQL examples that correspond with your dataset. - Revise
code/lambda/action-lambda/prompt_templates.py
to mirror the attributes of your new tabular data. - Modify the
cdk.json
file'scontext/configure/bedrock_instructions/action_group_description
field to elucidate the purpose and functionality of the action lambda tailored for your dataset. - Reflect the new functionalities of your action lambda in the
assets/agent_api_schema/artifacts_schema.json
file.
- In the
cdk.json
file, under thecontext/configure/bedrock_instructions/agent_instruction section
, provide a comprehensive description of the Amazon Bedrock Agent's intended functionality and design purpose, taking into account the newly integrated data.
These steps are designed to ensure a seamless and efficient integration process, enabling you to deploy the solution effectively with your bespoke data.