📚 A course brought to you by the Data Minded Academy.
These are the exercises used in the course Data Pipeline Part 2 at DSTI.
The course has been developed by instructors at Data Minded. The
exercises are meant to be completed in the lexicographical order determined by
name of their parent folders. That is, exercises inside the folder b_foo
should be completed before those in c_bar
, but both should come after those
of a_foo_bar
.
- Introduce good data engineering practices.
- Illustrate better infrastructure management with Terraform and GitHub Actions.
- Illustrate Terraform concepts, like modules, tfstate & providers. Not limited to these three though.
- Use GitHub Actions to automate your deployments.
- People interesteed on DevOps practices applied to Data Engineering or soon to be working with Terraform/GitHub Actions.
- Familiar with AWS and IaC
Lecturer first sets the foundations right for Infrastructure as Code and gradually builds up to Terraform and GitHub Actions.
There is a high degree of participation expected from the students: they will need to write code themselves and reason on topics, so that they can better retain the knowledge.
Note: this course is not about writing the best code possible. There are many ways to skin a cat, in this course we show one (or sometimes a few), which should be suitable for the level of the participants.
To install Terraform, we recommend using a version manager such as tfswitch or tfenv.
Specifically for Windows, follow this step by step tutorial we've prepared: Click Here
For the exercises, we'll be using Terraform versions >=1.0.0
In order for Terraform to deploy infrastructure on AWS, Terraform needs to make authenticated API requests on your behalf.
You will need the AWS_ACCES_KEY_ID
and AWS_SECRET_ACCESS_KEY
from your account.
First make sure that you have the AWS CLI
installed. Then run aws configure
and fill in the access keys and configure the region and output format as follows:
aws configure
AWS Access Key ID [None]: $YOUR_ACCES_KEY_ID
AWS Secret Access Key [None]: $YOUR_SECRET_ACCES_KEY
Default region name [None]: eu-west-1
Default output format [None]: json
You should now be able to start with the exercises. cd
your way into the first exercise folder (a_provider_config
),
read the instructions in each README.md and try to write some Terraform code to solve the problem. Don't worry,
the majority of the code you can find in or adapt from the Terraform provider documentation!
Good luck, and remember: terrafrom init
, terraform plan
, and terrafrom apply
all the things!
P.S.: Exercise g_extra
will be very useful in your project since you'll have to deploy it in Amazon EMR Serverless and Amazon MSK and automate
the deployment.
Have a look at the Terraform documentation on Amazon EMR Serverless Here