This document contains steps on how to bring up the environment. For a discussion on the architecture and design decisions, please see design.pdf
First, ensure that you've configured your AWS CLI accordingly. Setting that up is outside the scope of this guide so please go ahead and read up at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
Grab the latest Terraform CLI here
Grab it via this guide
Grab it via this guide
Grab it via this guide
terraform -chdir=terraform init
cp terraform/example.tfvars terraform/terraform.tfvars
Then modify the file as you see fit.
whale_aws_cli_profile=$(grep -E ' *profile *=' terraform/terraform.tfvars | sed -E 's/ *profile *= *"(.*)"/\1/g')
whale_aws_region=$(grep -E ' *region *=' terraform/terraform.tfvars | sed -E 's/ *region *= *"(.*)"/\1/g')
whale_env_name=$(grep -E ' *env_name *=' terraform/terraform.tfvars | sed -E 's/ *env_name *= *"(.*)"/\1/g')
whale_db_creds_secret_name=$(grep ' *db_creds_secret_name *=' terraform/terraform.tfvars | sed -E 's/ *db_creds_secret_name *= *"(.*)"/\1/g')
whale_secret_file=~/.whale/secrets/db_creds-${whale_env_name}.json
mkdir -p ~/.whale/secrets
chmod 0700 ~/.whale/secrets
cat > $whale_secret_file <<EOF
{
"db_user": "SU_$(uuidgen | tr -d '-')",
"db_pass": "$(uuidgen)"
}
EOF
chmod 0600 $whale_secret_file
aws secretsmanager create-secret \
--profile "$whale_aws_cli_profile" \
--name "$whale_db_creds_secret_name" \
--description "Whale DB credentials for ${whale_env_name} environment" \
--secret-string file://$whale_secret_file
First, get a hold of an FQDN that you own and define it in an env var:
whale_zone_fqdn=<TYPE-IN-YOUR-FQDN-HERE>
Let's also create a unique caller reference:
whale_route53_caller_reference=$(uuidgen | tr -d '-')
Then, create the zone:
whale_aws_cli_profile=$(grep -E ' *profile *=' terraform/terraform.tfvars | sed -E 's/ *profile *= *"(.*)"/\1/g')
whale_aws_region=$(grep -E ' *region *=' terraform/terraform.tfvars | sed -E 's/ *region *= *"(.*)"/\1/g')
aws route53 create-hosted-zone \
--profile "$whale_aws_cli_profile" \
--name "$whale_zone_fqdn" \
--caller-reference "$whale_route53_caller_reference" > tmp/create-hosted-zone.out
List the nameservers for your zone:
cat tmp/create-hosted-zone.out | jq -r '.DelegationSet.NameServers[]'
Now modify your DNS servers to use the hosts listed above.
terraform -chdir=terraform apply
Proceed once the above is done.
Use ssh4realz to ensure you connect to the bastion securely. For a guide on how to (and why) use the script, see this video.
ssh4realz $(terraform -chdir=terraform output -raw bastion1_instance_id)
With the bastion's host key already saved to your known_hosts file, just SSH directly to its public ip.
ssh -A ubuntu@$(terraform -chdir=terraform output -raw bastion1_public_ip)
Back in your local machine
aws eks --region=$(terraform -chdir=terraform output -raw region) \
update-kubeconfig \
--name $(terraform -chdir=terraform output -raw k8s_cluster_name)
kubectl config use-context $(terraform -chdir=terraform output -raw k8s_cluster_arn)
chmod 0600 ~/.kube/config
Check that you're able to connect to the kube-api-server:
kubectl get pods --all-namespaces
# Print out the DB endpoint for reference
terraform -chdir=terraform output db_endpoint
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh
Once in the prompt, run:
/ # telnet <HOSTNAME-PORTION-OF-db_endpoint-OUTPUT> <PORT-PORTION-OF-db_endpoint-OUTPUT>
It should output:
Connected to <HOSTNAME>
To exit:
<Press Ctrl-] then Enter then e>
/ # exit
aws ecr get-login-password --region $(terraform -chdir=terraform output -raw region) | \
docker login --username AWS --password-stdin $(terraform -chdir=terraform output -raw registry_ui)
aws ecr get-login-password --region $(terraform -chdir=terraform output -raw region) | \
docker login --username AWS --password-stdin $(terraform -chdir=terraform output -raw registry_api)
For this section, we will follow this AWS guide:
kubectl create namespace prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade -i prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2",server.persistentVolume.storageClass="gp2"
Watch for the status of each prometheus pod via:
watch -d kubectl get pods -n prometheus
Once all of them are up, temporarily set up port forwarding to access the Prometheus UI:
kubectl --namespace=prometheus port-forward deploy/prometheus-server 9090
Browse to http://localhost:9090
When you're done, hit Ctrl-C to stop the port forwarding.
OIDC will be used by some pods in the cluster to connect to the AWS API. This section will be based off of this guide
First check if the cluster already has an OIDC provider:
aws eks describe-cluster \
--region $(terraform -chdir=terraform output -raw region) \
--name $(terraform -chdir=terraform output -raw k8s_cluster_name) \
--query "cluster.identity.oidc.issuer" \
--output text
It should return something like:
https://oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E
Now grep that sample ID from your list of OIDC providers:
aws iam list-open-id-connect-providers | grep <EXAMPLED539D4633E53DE1B716D3041E>
If the above command returned an ARN, you're done with this section. If it did not return one, then run:
eksctl utils associate-iam-oidc-provider \
--region $(terraform -chdir=terraform output -raw region) \
--cluster $(terraform -chdir=terraform output -raw k8s_cluster_name) \
--approve
Rerun the aws iam command above again (including the pipe to grep) to double check.
kubectl apply --validate=false -f cert-manager/cert-manager.yaml
Watch for the status of each pod via:
watch -d kubectl get pods -n cert-manager
We will base the following steps on this guide
aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://aws-lb-controller/iam-policy.json | \
tee tmp/iam-policy.out
whale_aws_account_id=$(terraform -chdir=terraform output -raw account_id)
whale_k8s_cluster_name=$(terraform -chdir=terraform output -raw k8s_cluster_name)
eksctl create iamserviceaccount \
--cluster="$whale_k8s_cluster_name" \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--attach-policy-arn=arn:aws:iam::${whale_aws_account_id}:policy/AWSLoadBalancerControllerIAMPolicy \
--override-existing-serviceaccounts \
--approve
cat aws-lb-controller/load-balancer.yaml | \
sed 's@--cluster-name=WHALE_CLUSTER_NAME@'"--cluster-name=${whale_k8s_cluster_name}"'@' | \
kubectl apply -f -
Watch the controller pod go up via:
watch -d kubectl get pods -n kube-system
kubectl create ns whale
whale_env_name=$(terraform -chdir=terraform output -raw env_name)
kubectl create secret generic postgres-credentials -n whale --from-env-file <(jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" ~/.whale/secrets/db_creds-${whale_env_name}.json)
The following steps are based off of this guide, and this bit of a (working) hack:
Next, let's deploy the cluster issuer in another terminal:
whale_dns_zone=<TYPE-IN-YOUR-FQDN-HERE>
whale_env_name=$(terraform -chdir=terraform output -raw env_name)
whale_region=$(terraform -chdir=terraform output -raw region)
whale_hosted_zone_id=$(aws route53 list-hosted-zones | \
jq -r ".HostedZones[] | select(.name=\"${whale_dns_zone}\") | .Id" | \
rev | cut -d '/' -f 1 | rev)
whale_cert_manager_role_arn=$(terraform -chdir=terraform output -raw cert_manager_role_arn)
cat cert-manager/cluster-issuer.yaml | \
sed 's@WHALE_DNS_ZONE@'"${whale_dns_zone}"'@' | \
sed 's@WHALE_ENV_NAME@'"${whale_env_name}"'@' | \
sed 's@WHALE_REGION@'"${whale_region}"'@' | \
sed 's@WHALE_HOSTED_ZONE_ID@'"${whale_hosted_zone_id}"'@' | \
sed 's@WHALE_CERT_MANAGER_ROLE_ARN@'"${whale_cert_manager_role_arn}"'@' | \
kubectl apply -f -
Check that it created the secret for our app:
kubectl get secret ${whale_env_name}-tls -n cert-manager
First, lets follow events in the whale namespace to know what's happening when we apply our manifest later:
kubectl get events -n whale -w
make ui
In the other terminal session where you're watching events, wait for this line:
0s Normal CertificateIssued certificaterequest/whale-prod-tls-<pod-suffix> Certificate fetched from issuer successfully
Be patient though as it can take a few minutes and you'll see errors like this:
Error presenting challenge: Time limit exceeded. Last error:
or:
Failed build model due to ingress: whale/ingress-whale-api: none certificate found for host: ui.whale.kubekit.io
Ignore those. Check the status as well via:
https://check-your-website.server-daten.de/?q=${component}.${whale_dns_zone}
scripts/configure-tls-resources ui <DNS_ZONE-FQDN-HERE>
Once this script completes, the AWS LB Controller will be able to create the ALB fronting the UI.
make api
In the other terminal session where you're watching events, wait for this line:
0s Normal CertificateIssued certificaterequest/whale-prod-api-tls-<pod-suffix> Certificate fetched from issuer successfully
Be patient though as it can take a few minutes and you'll see errors like this:
Error presenting challenge: Time limit exceeded. Last error:
or:
Failed build model due to ingress: whale/ingress-whale-api: none certificate found for host: api.whale.kubekit.io
Ignore those. Check the status as well via:
https://check-your-website.server-daten.de/?q=${component}.${whale_dns_zone}
scripts/configure-tls-resources api <DNS_ZONE-FQDN-HERE>
Once this script completes, the AWS LB Controller will be able to create the ALB fronting the API.
whale_env_name=$(terraform -chdir=terraform output -raw env_name)
whale_k8s_cluster_name=$(terraform -chdir=terraform output -raw k8s_cluster_name)
whale_aws_account_id=$(terraform -chdir=terraform output -raw account_id)
scripts/delete-tls-resources ui <DOMAIN-FQDN-HERE>
kubectl delete ns whale
kubectl delete -f aws-lb-controller/load-balancer.yaml
kubectl delete ns cert-manager
kubectl delete ns prometheus
eksctl delete iamserviceaccount \
--cluster=$whale_k8s_cluster_name \
--namespace=kube-system \
--name=aws-load-balancer-controller
cat tmp/iam-policy.out | \
jq -r '.Policy.Arn' | \
xargs -I {} aws iam delete-policy --policy-arn {}
terraform -chdir=terraform destroy
Finally, if you no longer plan on bringing up this cluster at a later point in time, clean up the following as well:
aws secretsmanager delete-secret \
--force-delete-without-recovery \
--secret-id "whale-db-creds-${whale_env_name}"
whale_zone_fqdn=<TYPE-IN-YOUR-FQDN-HERE>
whale_aws_cli_profile=$(grep -E ' *profile *=' terraform/terraform.tfvars | sed -E 's/ *profile *= *"(.*)"/\1/g')
whale_aws_region=$(grep -E ' *region *=' terraform/terraform.tfvars | sed -E 's/ *region *= *"(.*)"/\1/g')
aws route53 delete-hosted-zone \
--profile "$whale_aws_cli_profile" \
--name "$whale_zone_fqdn" \
--caller-reference "$whale_route53_caller_reference" > tmp/create-hosted-zone.out