Starter for end-to-end AWS serverless technology-based pipeline for a batch model, supporting ECS or Kubernetes for model execution, and Spark running on EMR for ETL.
With the infrastructure automatically provisioned using IaC and CI/CD, you provide an implementation of the model, ETL, and validation using starter code, and changes will automatically deploy to your cluster.
Status | Description |
---|---|
❌ | AWS Step Function-based orchestration |
❌ | Lambda stubs for input and output validation |
❌ | CloudWatch-based scheduled events for execution |
❌ | Dockerized model execution in ECS |
❌ | Dockerized model execution in EKS |
❌ | ETL using Spark on EMR |
✅ | Input and output stored in S3 |
✅ | Terraform for infrastructure management |
❌ | Jenkins CI/CD pipeline |
These are necessary for local development, and will also be necessary for doing some setup prior to configuring the CI/CD pipeline.
Note that our CI/CD pipeline will automatically install these dependencies in the worker nodes.
- (Recommended) Install and configure AWS CLI. (Tested with version
1.16.304
.) - Install and configure Terraform. (Tested with version
0.12.19
.)
Prior to setting up the automated deployment pipeline, you need to setup Terraform so that it can used S3 for storage of shared state. This will enable everyone working on your project to contribute infrastructure changes.
Though the bucket will be managed by Terraform, it cannot be created by Terraform because Terraform is dependent on the shared state that will be stored in the bucket; it's a catch-22. The workaround is to first create the S3 bucket, then initialize Terraform (which creates the shared state at terraform/storage/terraform.tfstate
), and afterwards manually import the bucket into Terraform.
- Manually create an S3 bucket, which will be used for storing Terraform state, as well as input and output data.
- E.g., using AWS CLI:
aws s3 mb s3://my-bucket --region us-east-1
- E.g., using AWS CLI:
- Import the bucket to be under Terraform control
- Open terminal, and change directory:
cd storage/terraform
- Initialize Terraform locally:
terraform init \ -backend-config="bucket=my-bucket" \ -backend-config="region=us-east-1"
- Import the bucket you just created so that it is managed by Terraform:
terraform import \ -var="bucket=my-bucket" \ -var="region=us-east-1" \ aws_s3_bucket.bucket my-bucket
- (Recommended) Confirm Terraform is correctly setup by running plan:
If successful, you'll see a view that shows which changes will be made. To apply them now:
terraform plan \ -var="region=us-east-1" \ -var="bucket=my-bucket"
terraform apply -auto-approve \ -var="region=us-east-1" \ -var="bucket=my-bucket"
- Open terminal, and change directory:
- (Recommended) Confirm Terraform shared state was successfully created within the bucket.
- E.g., using AWS CLI:
aws s3 ls s3://my-bucket/terraform/storage/terraform.tfstate
- E.g., using AWS CLI:
TODO
TODO
Directory/File | Description |
---|---|
storage/ |
Manages S3 for input, outputs, and Terraform state management |