Kubernetes Ops has been a culmination of how we have been helping clients use Kubernetes over the years. There has been a lot of trial an error as we have grown up with Kubernetes. This represents how we are currently helping our clients use Kubernetes and how we help them maintain their infrastructure. You can view this as a reference implementation of a fully productionalized Kubernetes setup.
We lean towards the immutable infrastructure and Gitops Flow methodologies and use no configuration management tools. Everything starts out in Git as either code or configuration. Items are manipulated to what we want the desired state to be and that is applied onto the infrastructure.
One of the biggest problem that this repository helps out with is what people are
starting to refer to as "day 2" problems (where "day 1" is creation). The "day 1"
problems are well documented and there are plenty of tutorials out there for it.
The problem with these tutorials and examples are that they mostly leave you hanging
on how to move forward with the infrastructure pieces. Day 2 problems are: what is the upgrade, patching,
and modification strategy, how do I manage the infrastructure git repository, etc?
This is where we think we can provide some contribution.
With our experience in managing many Kubernetes clusters over the years, we think
we can provide this information. Creation of your cluster is about 10 to maybe
20 percent of the infrastructure activity (if that), making changes to the infrastructure
to suite your needs as time moves on is the bulk of the activity and finally
deletion of the entire or parts of the infrastructure as new items comes into play.
If you follow through the instructions, you might think this is overly complex. If this is your first time playing around with Kubernetes, it probably is overly complex and this project is probably not well suited for you at this time. When you want to take Kubernetes into production, this is where we think this set of methodologies starts to shine. For example, this gives you an example of how to lay out a repository for you infrastructure. It gives you the process and workflow to create and update infrastructure pieces. From working with many clients, we have come to a place where managing the infrastructure in this way has made sense and has worked out really well in large and small teams.
Isn't there already projects that bring up Kubernetes for me? Why don't I just use GKE, EKS, AKS, *KS?
Yes, there is and we use all of that. You can use anyone of those services and go to the respective web console and bring up a Kubernetes cluster. In our opinion and from our experience this is fine if you are testing out Kubernetes or just trying out something new. It is fast and easy to understand what is going on. However, when you want to bring that "new thing" into production, managing it that way is not ideal. It is hard to reproduce from dev to qa to prod. Making manual changes are hard to track and very error prone.
We stress that this project does not represent the "easy" way of managing infrastructure. This project represents a way to manage infrastructure in a Gitops flow kinda way and in a sane way where a team of people can work on it together.
This project mainly follow a Gitops workflow methodology. Changes are made in this repository to code or configs in a branch. A PR can be opened on that branch where other team members can review the changes. Then depending on your merging techniques and automation it can be applied or merged then applied to any one environment.
We have found that having an "infrastructure" repository makes sense. You need these items to live somewhere. It is usually not application code and it is an entity all to itself. The infrastructure repository also usually gets fairly large overtime as new items gets added into the software stack and new requirements for services comes along.
As an organization grows, it also tends to be a different set of people that maintains the infrastructure and this repository. You have application developers and DevOps or infrastructure groups. Even if you have those two teams in the same group having this separate is a good delineation on what is actually being changed. If items in here are changed, it is clearly an infrastructure related item.
We mainly only use open source tools. There might be some paid tools eventually ending up in this repository and we will explicitly label those.
Infrastructure building:
- Terraform
- Terragrunt
- Kops
Kubernetes:
- Kops using pure upstream Kubernetes
- Helm
- Helm Charts from their repository
- All kinds of other open source Kubernetes packages
Kubernetes clusters:
- Kops
- GKE
- EKS
There are various docs and guides in the docs
directory.
This is the first thing you should read. This has all of the setup information that you will need to get started.
This is a more manual walk through on how to create a cluster using this project. The intention here is to give you a deep dive into what goes into creating a Kops cluster:
the "easier way" takes the manual steps in the previous example and hides most of the steps in a script where you can just run:
- A very isolated VPC with only a few public IP address exposed to the internet
- Dedicated subnets for each item types. This allows you to segregate items better.
- Redundant Kubernetes masters in 3 availability zones
- Redundant Kubernetes worker nodes in 3 availability zones
Kubernetes on GCP via GKE clusters
- A very isolated VPC with only a few public IP address exposed to the internet
- Dedicated subnets for each item types. This allows you to segregate items better.
- Redundant Kubernetes masters in 3 availability zones
- Redundant Kubernetes worker nodes in 3 availability zones
These are the list of services that are maintained for each cloud
Service Name | Supported in AWS | Supported in GCP | source |
---|---|---|---|
cert-manager | yes | yes | helm/stable |
cluster-autoscaler | yes | no | helm/stable |
external-dns | yes | yes | helm/stable |
graylog | yes | yes | helm/stable |
jenkins | yes | yes | helm/stable |
kube-bench | yes | yes | helm/stable |
kube-downscaler | yes | yes | helm/stable |
loki | yes | yes | loki |
nginx-ingress | yes | yes | helm/stable |
prometheus blackbox exporter | yes | yes | helm/stable |
prometheus operator | yes | yes | helm/stable |
sumologic-fluentd | yes | yes | helm/stable |
threatstack | yes | yes | Threatstack |
helm tiller -rbac enabled | yes | yes | - |
vault-helm | yes | yes | Hashicorp |