/dbt-serverless

Run dbt serverless in the Cloud (AWS)

Primary LanguageHCLMIT LicenseMIT

dbt-serverless

Run dbt serverless in the Cloud (AWS)

Requirements

  • aws credentials configured in ~/.aws/credentials
  • aws cli
    pip install awscli
    
  • terraform

Deploy

The infrastructure is based on terraform. I setup a terraform backend to keep terraform state. The backend is based an S3 bucket that was created manually. You can create an S3 bucket simply running:

aws s3api create-bucket --bucket nicor88-eu-west-1-terraform --region eu-west-1 --create-bucket-configuration LocationConstraint=eu-west-1

Remember to change the name of the S3 bucket inside infrastructure/provider.tf before running the following commands:

export AWS_PROFILE=your_profile
make infra-plan
make infra-apply

After the infra is created correctly, you can push an new image to the ECR repository running:

make push-to-ecr AWS_ACCOUNT_ID=your_account_id

Note

Currently Aurora Postgres is only accessible inside the VPC. I create a Network load balancer, to connect to the DB from everywhere, but you need to get the Private IP of Aurora Endpoint. You can simply run:

nslookup your_aurora_enpoint
# returned from the terraform outputs

Then you need to replace the 2 variables:

  • autora_postgres_serverless_private_ip_1
  • autora_postgres_serverless_private_ip_2

and apply again the changes with the command make infra-apply

Infrastructure

AWS Step Function

Input example

{
  "commands1": [
    "dbt",
    "run",
    "--models"
    "example"
  ],
  {
  "commands2": [
    "dbt",
    "run",
  	"--models"
    "just_another_example"
  ]
}
}

Airflow operator

It's possible to invoke ECS Fargate containers to run dbt also from Airflow. Here an example of how to call a DbtOperator from Airflow:

dbt_run_example = DbtOperator(
    dag=dag,
    task_id='dbt_example',
    command='run',
    target='dev',
    dbt_models='my_example',
    subnets=['subnet_id_1', 'subnet_id_2'],
    security_groups=['sg_1']
)