Terraform and AWS training
Pre-requisites
- Add
AWS - Beach
Okta Chiclet - Install AWS CLI
- Install AWS CLI Session Manager Plugin
- Install SAML2AWS
- Install Terraform
Configure CLI access to AWS
We're going to use the saml2aws
application to authenticate to AWS using our Okta credentials. The following snippet will configure saml2aws
to use Okta verify for multi-factor authentication. If you use Google Authenticator then replace --mfa OKTA
with --mfa TOTP
.
This is a one-time setup.
OKTA_USER=<you>@thoughtworks.com
saml2aws configure \
--idp-account tw-beach \
--idp-provider Okta \
--mfa OKTA \
--username ${OKTA_USER}\
--url https://thoughtworks.okta.com/home/amazon_aws/0oa1c9mun8aIqVj7I0h8/272 \
--skip-prompt
Login to AWS
Having setup saml2aws
we can now use it to authenticate. We're going to login by providing our Okta password and MFA token and then set environment variables that the AWS CLI will use for authentication. Finally we're going to verify that everything has worked. The result of calling aws sts get-caller-identity
should be some JSON identifying your UserId, Account and Arn.
OKTA_USER=<you>@thoughtworks.com
saml2aws login \
--idp-account tw-beach \
--profile default \
--region eu-west-1 \
--username ${OKTA_USER}
eval $(saml2aws script -a tw-beach)
aws sts get-caller-identity
This will provide us with credentials that we can use for 1h. We will need to re-run the following script to get a new token before this occurs:
saml2aws login -a tw-beach
eval $(saml2aws script -a tw-beach)
Introduction to Terraform
Terraform is a tool for declaratively defining infrastructure resources. Terraform is cloud-agnostic: you can provision resources for AWS, GCP, Azure, Alibaba, Digital Ocean and many others as well as manage many other types of resource.
Project outline
In this example we define a minimal project with a typical structure.
cd work
cp ../01-outline/* .
terraform init
terraform apply
cat terraform.tfstate
Add AWS Provider
Terraform uses plugins to define the different types of resources it can manage. It calls these plugins providers. The list of providers can be found on the Terraform Registry. We're going to use the AWS Provider.
In this example, we use the aws_caller_identity
data source to retrieve our authentication details (just like calling aws sts get-caller-identity
).
cp ../02-aws-provider/* .
terraform init
terraform apply
cat terraform.tfstate
Introduction to AWS
AWS has data-centres in many parts of the world. Each of the parts of the world is known as a region. Regions have names such as eu-west-1
or us-east-2
. Each region has multiple physical independent data centres with super-high bandwidth/low latency connections between them. Each physical data-centre in a region is known as an availability zone (AZ). Availability zones are identified by a single character appended to the region, e.g. eu-west-1a
and eu-west-1b
.
Computer infrastructure comes down to 3 types of resource (and combinations of those):
- Compute
- Storage
- Networking
VPC
For any provisioned (i.e. not serverless) resources we need a network to connect them to. AWS provides a resource known as a Virtual Private Cloud (VPC) that allows us to create our own private network for connecting our resources.
Each resource that we connect to the network requires one or more IP addresses. We need to make sure we provision a network that's big enough for all the things we might want to provision. If we aren't going to connect our private network to any other private networks (through mechanisms like peering and direct connect) then we can make it as big as we like.
The size of the network (and the set of IP addresses it will use) are defined using CIDR notation. The IP specified is the first IP of the block to use and the size of the block is specified after the slash. A smaller number after the slash the bigger the network.
The IP addresses must come from one of the private ranges:
cp ../03-aws-vpc/* .
terraform apply
cat terraform.tfstate
You can destroy the infrastructure you've created by using:
terraform destroy
EXERCISE
No resources are ever provisioned directly into a VPC. Instead, resources are provisioned into a subnet. Subnets provide a way of splitting the VPC into smaller pieces.
Provision two aws_subnet
resources within our VPC. Associate each of the subnets with a different availability zone.
N.B. you can use the subnet calculator to help work out the CIDR blocks for these subnets.
Looping and conditionals
Terraform provides facilities for doing basic looping. A resource can have a count
attribute. Terraform will create count
instances of the resource. The current index is available via count.index
.
A locals
block introduces local variables. Local variables are referenced as local.<var-name>
. Local variables can only be set from within the code.
Terraform expressions and functions can be used do add imperitive logic to Terraform scripts: but these should be used with caution. The example here is a good example of why that caution is required!
Try different values of the az_count
and include_db
variables to see their effect.
cp ../05-looping/* .
terraform plan -var=owner=<initials> -var=region=eu-west-1 -var=vpc_cidr=10.0.0.0/16
Variables can also be set as environment variables:
export TF_VAR_owner=<initials>
export TF_VAR_region=eu-west-1
export TF_VAR_vpc_cidr=10.0.0.0/16
terraform plan
Creating an Instance
In this next example, we want to create a virtual machine (an aws_instance
) that we can connect to with a shell.
Our VPC is not connected to the Internet so we can't SSH onto the host. We're going to use an AWS feature called Systems Manager Session Manager to connect to our host.
We now have a lot more to contend with as to make this happen our virtual machine needs to call AWS API's to interact with the AWS service - and we need to call these API's without an Internet connection. To enable us to call these API's we're going to use a feature called VPC Interface Endpoints (aws_vpc_endpoint
). These provision Elastic Network Interfaces (ENIs) into our subnets that can receive the API requests and transmit them over the AWS network to the AWS services. We need the DNS entries for these services to point at our ENIs - so we need to enable DNS support on our VPC.
In our terraform code we're using the "splat" operator to reference a set of attributes from a resource we created where the count
attribute was set.
The VPC Interface Endpoints require us to setup a Security Group (aws_security_group
) to govern which traffic the endpoint can receive. A Security Group is a kind of Firewall.
When we create an instance we define the base image that will be used to provision it. These are called Amazon Machine Images (AMIs). We're going to look up the latest Ubuntu release from Canonical.
Instance types define how much CPU, memory, network capacity and ephemeral disk our instance will have.
As our instance will need to call AWS APIs to support session manager: it will need permissions to allow this. The AWS API authentication/authorization system is called Identity and Access Management (IAM). Our instance will also need a security group that let it send the AWS API requests.
cp ../06-aws-instance/* .
terraform apply
INSTANCE_ID=$(aws ec2 describe-instances --query 'Reservations[*].Instances[?State.Name==`running` && Tags[?Key==`Name` && Value==`<your-initials>-instance`]].InstanceId[]' --output text)
aws ssm start-session --target "${INSTANCE_ID}"
EXERCISE
AWS has a concept of an Auto Scaling Group (ASG). An ASG enables us to provision a group of hosts and will ensure that the number of hosts always corresponds to the desired capacity. It can also be configured to automatically update the desired capacity based on various metrics.
Replace our aws_instance
with an aws_auto_scaling_group
that has a minimum capacity of 1 instance and a maximum capacity of 2 instances. When scaled up - the instances should be distributed across our subnets.
You can scale it up using the AWS CLI (assuming you named your ASG -instance-group):
aws autoscaling set-desired-capacity --auto-scaling-group-name <your-initials>-instance-group --desired-capacity 2
Internet Access
Up until this point, our VPC has been entirely disconnected from the Internet. We've been able to access our instance using AWS services but our instance has been unable to reach the Internet.
In this example, we add Internet access to our VPC. To allow a VPC to connect to the Internet we need to provision an Internet Gateway (aws_vpc_internet_gateway
). However, our instance doesn't have a public IP address so we need to also provision a NAT Gateway (aws_vpc_nat_gateway
) and have traffic from our private subnets route to the NAT gateway - which runs in our public subnets and does have a public IP address) and have traffic from our public subnets (including the NAT Gateway) route to the Internet gateway.
@startuml
top to bottom direction
package pvsa as "private-subnet-a" {
node ina as "instance-a"
}
package pvsb as "private-subnet-b" {
node inb as "instance-b"
}
package pusa as "public-subnet-a" {
node ngw as "nat gw"
}
package pusb as "public-subnet-b" {
node inc as "instance-c"
}
package igw as "internet gw"
cloud "internet"
ina --> ngw
inb --> ngw
ngw --> igw
inc --> igw
igw --> internet
@enduml
We no longer require our VPC Interface Endpoints to reach the AWS API's as we can now reach them over the Internet.
cp ../08-internet-access/* .
terraform apply
INSTANCE_ID=$(aws ec2 describe-instances --query 'Reservations[*].Instances[?State.Name==`running` && Tags[?Key==`Name` && Value==`<your-initials>-instance-group`]].InstanceId[]' --output text)
aws ssm start-session --target "${INSTANCE_ID}"
Provisioning
We've got a host that can connect to - but its not doing anything useful. We need to provision some software onto a host to provide some value.
We can use cloud-init to configure and provision software onto a host. We provide the cloud-init configuration to the AWS instances via the user_data
or user_data_base64
attributes of aws_instance
, aws_launch_template
and aws_launch_configuration
.
We going to provision a host that runs NGINX and that can be reached from the Internet.
cp ../09-provisioning/* .
terraform apply
NGINX=$(aws ec2 describe-instances --query 'Reservations[*].Instances[?State.Name==`running` && Tags[?Key==`Name` && Value==`<your-initials>-instance-group`]].PublicDnsName[]' --output text)
curl http://$NGINX
EXERCISE
Currently we have to go and find the IP address of our NGINX instance using the AWS API. This doesn't work if we have >1 instance in our auto-scaling group and this value can change as the auto-scaling group expands or contracts or if the instance is replaced.
Add a classic load-balancer aws_elb
to front our auto-scaling group. Add a DNS entry of <your-initials>.daleaws.co.uk
that points to the load-balancer (aws_route53_record
using an alias
).
Cleanup
Now clean everything up:
terraform destroy