/kubernetes-on-aws

Deploying Kubernetes on AWS with CloudFormation and Container Linux

Primary LanguageGoMIT LicenseMIT

Kubernetes on AWS

WORK IN PROGRESS

This repo contains configuration templates to provision Kubernetes clusters on AWS using Cloud Formation and CoreOS Container Linux.

Consider this as beta quality. Many values are hardcoded, and currently we're focusing on solving our own, specific/Zalando use case. However, we are open to ideas from the community at large about potentially turning this idea into a project that provides universal/general value to others. Please contact us via our Issues Tracker with your thoughts and suggestions.

Configuration in this repository initially was based on kube-aws, but now depends on three components which aren't all yet open sourced:

  • Cluster Registry to keep desired cluster states (e.g. used config channel and version)
  • Cluster Lifecycle Manager to provision the cluster's Cloud Formation stack and apply Kubernetes manifests for system components
  • Authnz Webhook to validate OAuth tokens and authorize access

We plan to release all required components to the community in Q2/2018.

Lean more about Zalando's cloud native journey by reading the Zalando Case Study on kubernetes.io. Please watch our meetup talk "Kubernetes on AWS at Europe's Leading Online Fashion Platform" on YouTube to learn how we run Kubernetes on AWS in production. See our Running Kubernetes in Production on AWS document for details on the setup.

Features

  • Highly available master nodes (ASG) behind ELB
  • Worker Auto Scaling Group with node pools support
  • Flannel overlay networking
  • Cluster autoscaling (using cluster-autoscaler)
  • Kubernetes DNS with node-local dnsmasq as daemonset and CoreDNS resolver for cluster.local domain.
  • Route53 DNS integration via External DNS
  • AWS IAM integration via kube2iam
  • Standard components are installed: kube-dns, heapster, dashboard, node exporter, kube-state-metrics
  • Webhook authentication and authorization (roles "ReadOnly", "PowerUser", "Manual", "Emergency", "Administrator")
  • Emergency Access via internal emergency-access-service, that grant roles "Manual" and "Emergency" with 4 eyes principle and audit logging
  • Log shipping via Scalyr
  • Full Ingress support with ALB/SSL via kube-ingress-aws-controller and HTTP routing via skipper
  • Enhanced usability with managed stacks and blue green deployments via stackset-controller and skipper
  • Static Egress IPs to route through NAT Gateways with Elastic IPs via kube-static-egress-controller
  • Horizontal Pod Autoscaling with scaling by request per second, SQS queue size or others via kube-metrics-adapter
  • EFS support
  • GPU support
  • ETCD backup via Kubernetes cronjob and etcdctl snapshot and upload to S3
  • Monitoring via Prometheus and zmon
  • Fully automated cluster updates via Cluster Lifecycle Manager
  • Planned: Spot Fleet integration (#61)

Notes

  • Node and user authentication is done via tokens (using the webhook feature)
  • SSL client-cert authentication is disabled
  • Many values are hardcoded
  • Secrets (e.g. shared token) are not KMS-encrypted in the cluster

Assumptions

  • The AWS account has a single Route53 hosted zone including a proper SSL cert (you can use the free ACM service)
  • The VPC has at least one public subnet per AZ (either AWS default VPC setup or public subnet named "dmz-<REGION>-<AZ>")
  • The VPC is in region eu-central-1 or eu-west-1
  • etcd cluster is available via DNS discovery (SRV records) at etcd.<YOUR-HOSTED-ZONE>
  • OAuth Token Info is available to validate user tokens

Directory Structure

  • cluster/cluster.yaml: Cloud Formation template files for the cluster (will be applied by Cluster Lifecycle Manager)
  • cluster/config-defaults.yaml: Default values for different kind of use that can be overriden by values from our cluster-registry (will be applied by Cluster Lifecycle Manager)
  • cluster/etcd-cluster.yaml: Senza Cloud Formation to deploy ETCD
  • cluster/manifests: Kubernetes manifests for system components (will be applied by Cluster Lifecycle Manager)
  • cluster/node-pools: Cloud Formation template files and userdata (cloud-init) for ContainerLinux node-pools (will be applied by Cluster Lifecycle Manager)
  • docs: extracts from internal Zalando documentation (https://kubernetes-on-aws.readthedocs.io/)