Welcome to DataDialogue! This package allows you to deploy a Data Catalog in AWS Glue that interfaces with the OpenAI API through a Langchain Lambda function. The purpose is to execute SQL queries against your data catalog using an Athena-based connection. Additionally, an API Gateway is deployed that uses AWS Cognito for authentication.
Follow the instructions below to get your DataDialogue environment up and running.
- Docker installed on your machine
- AWS Account
- Terraform installed
- OpenAI GPT-4 API Key
Replace <account_id>
with your AWS account ID and choose your desired AWS region.
docker build -t aws-python3.11-sqllangchain:local -f Dockerfile.python3.11 .
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account_id>.dkr.ecr.us-east-1.amazonaws.com
aws ecr create-repository --repository-name hello-world --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE
docker tag aws-python3.11-sqllangchain:local <account_id>.dkr.ecr.us-east-1.amazonaws.com/aws-python3.11-sqllangchain:latest
docker push <account_id>.dkr.ecr.us-east-1.amazonaws.com/aws-python3.11-sqllangchain:latest
Open tf-poc/modules/api_setup/variables.tf
and update the image_uri
for the container.
terraform init
terraform plan
terraform apply
Navigate to AWS Secrets Manager in the AWS console. Find the OpenAI Secrets Manager and update it with your company's OpenAI GPT-4 API key.
You can test the setup using the AWS Lambda Test console or the API Gateway test console. Below are some payload examples:
{
"query": "What is the top 10 countries with higher oil price on the last year of data available?"
}
{ "query": "Which States reported the least and maximum deaths?" }
To integrate into an external app, open Cognito in the AWS console and create your user. Use this user to request a token as per AWS documentation.
We hope this guide helps you get started with DataDialogue. Feel free to contribute or raise issues!