This repository has all the code required to create your own Kubernetes cluster with JupterHub running in an AWS environment.
There are four steps required to create the AWS infrastructure, deploy a Kubernetes cluster and finally to deploy JupyterHub into the cluster.
In the first step we create the following AWS infrastruture with Terraform:
- VPC
- Internet gateway
- 2 public subnets with routes to the internet gateway
- 2 private subnets with NAT gateways with routes to the public subnets
- IAM instance profiles to be used for the master and nodes, as well as roles to be used for indexing and jupyter pods
- A postgres RDS instance to store the ODC's metadata index
- An S3 bucket to hold the Kubernetes state store.
This step requires you have the AWS CLI and Terraform to be installed:
- AWS CLI:
- Download: https://aws.amazon.com/cli/
- How to configure: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
- Terraform:
- Download: https://www.terraform.io/downloads.html
- How to use: https://www.terraform.io/intro/examples/index.html
Now that you have the AWS Cli and Terraform installed please clone this repo into your environment.
git clone https://github.com/frontiersi/datacube-jupyterhub-kubernetes.git
Familiarise yourself with the below variables that are located in the vars.tfvars file which you can change to suit your environment:
Variable | Example | Notes |
---|---|---|
vpc_cidr | 10.1.0.0/16 | What VPC CIDR to use. |
public_subnet_cidr1 | 10.1.1.0/24 | This is allocating 254 addresses for public subnet 1 which is plenty for us. |
public_subnet_cidr2 | 10.1.2.0/24 | Make sure your subnets do not overlap. |
private_subnet_cidr1 | 10.1.3.0/24 | |
private_subnet_cidr2 | 10.1.4.0/24 | |
region | ap-southeast-2 | Choose what region to deploy the resources. |
public_az1 | ap-southeast-2a | Choose what availabilty zone for each subnet. |
public_az2 | ap-southeast-2b | Good practice to spread your subnets across Availability zones |
private_az1 | ap-southeast-2a | |
private_az2 | ap-southeast-2b | |
db_instance_type | db.t2.medium | Choose what size DB instance you want. |
db_instance_size | 20 | Choose how much storage for your database. |
node_count | 2 | Choose how many Kubernetes nodes you want. |
node_size | t2.medium | Choose the type of those nodes. |
master_count | 1 | Amount of Kubernetes masters you want. Needs to be an odd number so a consensus can be made. |
master_size | t2.micro | Choose the size of the master |
Terraform will prompt you for these variables:
Variable | Example | Notes |
---|---|---|
name | sample-odc-cluster | The name of your cluster. Only use letters, hyphens, or digits (0-9) |
domain | test.your-domain.io | Domain to use. If you want a local cluster use. k8s.local. |
db_name | dbname | Must be alphanumeric characters, underscores, or digits (0-9). |
db_username | dbusername | Must contain 1 to 63 alphanumeric characters. First character must be a letter. |
db_password | password | Must contain 8 to 128 characters. |
kubernetes_state_store | sample-odc-cluster-state-store-your-account-name | S3 bucket to use to store the Kubernetes state store. S3 two s3 buckets cannot be named the same so make yours unique. |
Now run the below commands to create the base infrastructure:
cd infrastructure/
terraform init
terraform workspace new sample-odc-cluster
terraform apply -var-file="vars.tfvars"
Alernatively you can enter them with the -var parameter in one go:
terraform apply -var "name=sample-odc-cluster" -var "domain=test.your-domain.io" -var "kubernetes_state_store=sample-odc-cluster-state-store" -var "vpc_cidr=10.1.0.0/16" -var "public_subnet_cidr1=10.1.1.0/24" -var "public_subnet_cidr2=10.1.2.0/24" -var "private_subnet_cidr1=10.1.3.0/24" -var "private_subnet_cidr2=10.1.4.0/24" -var "region=ap-southeast-2" -var "public_az1=ap-southeast-2a" -var "public_az2=ap-southeast-2b" -var "private_az1=ap-southeast-2a" -var "private_az2=ap-southeast-2b" -var "db_instance_type=db.t2.medium" -var "db_instance_size=20" -var "db_name=sampleodccluster" -var "db_username=master" -var "db_password=foobartest" -var "db_instance_type=db.t2.medium" -var "db_instance_size=20" -var "node_count=2" -var "node_size=t2.medium" -var "master_count=1" -var "master_size=t2.micro"
This will create all the base infrastructure which Kubernetes will run off. Can take up to 10 minutes for all the resources to be created.
Now you have the AWS infrastructure in place you can create the Kubernetes cluster.
First you need the kubernetes management tools called kubectl and KOPS:
- Kubectl:
- Download and setup: https://kubernetes.io/docs/tasks/tools/install-kubectl/
- KOPS:
- Download and setup: https://github.com/kubernetes/kops#installing
To do this run these lines of code.
cd infrastructure/
# Set the clustername and state store variables
cluster_name=$(terraform output cluster_name)
state_store=$(terraform output state_store)
export KOPS_STATE_STORE=$state_store
# Generate the kubernetes cluster config file
terraform output cluster-config > cluster.yaml
# Use kops to create the cluster
kops create -f cluster.yaml
# Specify the location of a public key to use to be able to access your master and nodes
kops create secret --name $cluster_name sshpublickey admin -i ~/.ssh/id_rsa.pub
# Run the update cluster command. The lifecyle overrides are to enable the use of our pre-defined IAM roles and not use Kubernetes to create new IAM roles
kops update cluster --name=$cluster_name --lifecycle-overrides IAMRole=ExistsAndWarnIfChanges,IAMRolePolicy=ExistsAndWarnIfChanges,IAMInstanceProfileRole=ExistsAndWarnIfChanges --yes
This can take up to 15 minutes for the cluster to create. Check the status of the cluster with the following command:
kops validate cluster
Wait until the you see your cluster is ready before the next step.
Your cluster sample-odc-cluster.test.your-domain.io is ready
We use the Helm package manager to install the required applications ontop of the kubernetes cluster.
This step requires you have Helm installed
- Helm:
- Download: https://github.com/helm/helm#install
Run these commands to install Helm on your kubernetes cluster:
# Go to infrastructure Directory
cd infrastructure
# Get the clustername details
cluster_name=$(terraform output cluster_name)
state_store=$(terraform output state_store)
# Make sure kops and kubectl details are in correctly
export KOPS_STATE_STORE=$state_store
kubectl config use-context $cluster_name
# For Helm to work you need to create an account and pod for the helm tiller, commands to do that.
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller
It can take up to 5 minutess for Helm to be deployed into the cluster.
This step is to deploy pod based security Kube2IAM: https://github.com/jtblin/kube2iam#kube2iam. This allows you to control what aws access your kubernetes master, nodes and containers have.
# Go to the infrastructure Directory
cd infrastructure
# Get the clustername details
cluster_name=$(terraform output cluster_name)
state_store=$(terraform output state_store)
account_id=$(terraform output account_id)
domain=$(terraform output domain)
db_hostname=$(terraform output db_hostname)
# Make sure kops and kubectl details are in correctly
export KOPS_STATE_STORE=$state_store
kubectl config use-context $cluster_name
# Installing kube2iam for role based pod control
helm install stable/kube2iam --name kube2iam --namespace kube-system --set=extraArgs.base-role-arn=arn:aws:iam::$account_id:role/,extraArgs.default-role=kube2iam-default,host.iptables=true,rbac.create=true,verbose=true,host.interface=cali+
Next step is to install jupyterhub. See refernece here: https://zero-to-jupyterhub.readthedocs.io/en/stable/setup-jupyterhub.html
First add the jupterhub helm repo
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
Next to deploy juypterhub. To do this you need to define variables either in a jupyterhub/config.yaml file or using the --set parameter
Variable | Example | Notes |
---|---|---|
proxy.secretToken | hex32 number | Run "openssl rand -hex 32" to generate your own" |
https.hosts | juypter.sample-odc-cluster.test.your-domain.io | This is the domain that will directed to the jupyerhub load balancer. |
https.letsencrypt | systems@test.your-domain.io | email address for getting a ssl cert |
singleuser.extraEnv.DB_DATABASE | dbname | The db name you specified in step 1 |
singleuser.extraEnv.DB_HOSTNAME | db_hostname=$(terraform output db_hostname) | This is generated as an output from Terraform |
singleuser.extraEnv.DB_USERNAME | dbusername | The db username you specified in step 1 |
singleuser.extraEnv.DB_PASSWORD | password | The db password you specified in step 1 |
Run the following commands to deploy jupyterhub. Update the below values with the correct values for your environment making changes to the variables for your environment:
# Generate a random token required for Jupyterhub
token="$(openssl rand -hex 32)"
# Now deploy JupyterHub
helm upgrade --install odchub jupyterhub/jupyterhub \
--namespace odchub --version 0.7.0 --debug \
--values ../jupyterhub/config.yaml \
--set proxy.secretToken=$token \
--set https.hosts=<enter a hostname that you own here, for example, jupyterhub.example.com> \
--set https.letsencrypt.contactEmail=<enter you email here> \
--set singleuser.extraEnv.DB_DATABASE=<enter you database name specified in step 1.4> \
--set singleuser.extraEnv.DB_HOSTNAME=$db_hostname \
--set singleuser.extraEnv.DB_USERNAME=<enter you database username specified in step 1.4> \
--set singleuser.extraEnv.DB_PASSWORD=<enter you database password specified in step 1.4>
Lastly manaully add a Route 53 record to match the https.hosts variable you defined pointing at the external-ip from this command
kubectl --namespace=odchub get svc proxy-public --output=wide
Log into your AWS console and add a route 53 record with the CNAME pointing at the external-ip from the above command.
You should then be able to login at the url you provided, for example: https://juypterhub.example.com
When you want to remove your environment you need to remove the KOPS resources first followed by the Terraform resources. This is because Terraform will not be able to delete the subnets or vpc without the KOPS resources being removed first.
KOPS creates master and node servers as well as loadbalancers and autoscaling groups. You need to delete these via KOPS first before you can remove all the aws infrastructure. To do this enter the following commnad:
kops delete cluster $cluster_name --yes
This will take a couple of minutes
You can remove your aws infrastruture with the following commands:
cd infrastructure
terraform destroy -var-file="vars.tfvars"
- This will prompt you for the some details but you can put dummy answers as for a destroy its not important.
This will take around 10 minutes to remove all the resources.
If you get any errors from Terraform, check what resource is not getting removed. You can then log into the AWS console and try manaully removing it. This will give you a more detailed error. Usually a dependant resource was not deleted, for example maybe KOPS delete cluster was not run and you need to remove all those resources. Another example is you have manually spun up a server or database in the VPC. These need to be deleted before terraform can remove the VPC or subnets.
Once you have removed all the dependant resources you can run terraform delete again.