/gitops-k8s

Declarative pull-based GitOps repository representing the state of a Kubernetes cluster

Primary LanguageShell

gitops-k8s

Build Status

This document aims to provide an opinionated working solution leveraging Kubernetes and proven GitOps techniques to have a resilient, composable and scalable Kubernetes platform.

Nothing outlined below is new or innovative, but it should be at least a good starting point to have a cluster up and running pretty quickly and give you a chance to remain focused and try out new ideas.

Feedback and help are always welcome!


Introduction

TL;DR

  • Kubernetes is a declarative system
  • Git can be used to describe infrastructure and applications
  • Git repository is the source of truth and represents a cluster
  • GitOps is a way to do Continuous Delivery and operate Kubernetes via Git pull requests
  • GitOps empowers developers to do operations
  • CI pipelines should only run builds, tests and publish images
  • In a pull-based approach, an operator deploys new images from inside of the cluster
  • You can only observe the actual state of the cluster and react when it diverges from the desired state

Imperative vs Declarative

In an imperative system, the user knows the desired state, determines the sequence of commands to transition the system to the desired state and supplies a representation of the commands to the system.

By contrast, in a declarative system, the user knows the desired state, supplies a representation of the desired state to the system, then the system reads the current state and determines the sequence of commands to transition the system to the desired state.

Declarative systems have the distinct advantage of being able to react to unintended state changes without further supervision. In the event of an unintended state change leading to a state drift, the system may autonomously determine and apply the set of mitigating actions leading to a state match. This process is called a control loop, a popular choice for the implementation of controllers.

What is GitOps?

GitOps is the art and science of using Git pull requests to manage infrastructure provisioning and software deployment.

The concept of GitOps originated at Weaveworks, whose developers described how they use Git to create a single source of truth. Kubernetes is a declarative system and by using declarative tools, the entire set of configuration files can be version controlled in Git.

More generally, GitOps is a way to do Continuous Delivery and operate Kubernetes via Git.

Push vs Pull

In a push-based pipeline, the CI system runs build and tests, followed by a deployment directly to Kubernetes. This is an anti-pattern. CI server is not an orchestration tool. You need something that continually attempts to make progress until there are no more diffs because CI fails when it encounters a difference and then you could end up being in a partial and unknown state.

In a pull-based pipeline, a Kubernetes operator deploys new images from inside of the cluster. The operator notices when a new image has been pushed to the registry. Convergence of the cluster state is then triggered and the new image is pulled from the registry, the manifest is automatically updated and the new image is deployed to the cluster.

A CI pipeline should be used to merge and integrate updates with master, while with GitOps you should rely on Kubernetes or the cluster to internally manage deployments based on those master updates.

You could potentially have multiple cluster pointing to the same GitOps repository, but you won't have a centralized view of them, all the clusters will be independent.

Observability

Git provides a source of truth for the desired state of the system and observability provides a source of truth for the actual state of the running system.

You cannot say what actual state is in the cluster. You can only observe it. This is why diffs are so important.

A system is observable if developers can understand its current state from the outside. Observability is a property of systems like Availability and Scalability. Monitoring, Tracing and Logging are techniques for baseline observations.

Observability is a source of truth for the actual running state of the system right now. You observe the running system in order to understand and control it. Observed state must be compared with the desired state in Git and usually you want to monitor and alert when the system diverge from the desired state.

Resources

Argo CD

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. It automates the deployment of the desired application states in the specified target environments. In this project Kubernetes manifests are specified as helm charts.

This guide will explain how to setup in few steps the whole infrastructure via GitOps with Argo CD. Note that it's not tightly coupled to any specific vendor and you should be able to easily run it on DigitalOcean, EKS or GKE for example.

architecture

Most of the steps have been kept manual on purpose, but they should be automated in a production enviroment.

Prerequisites

  • Setup required tools
  • Create a Kubernetes cluster locally or with your favourite provider
  • Download the cluster configs and test connection
    export KUBECONFIG=~/.kube/<CLUSTER_NAME>-kubeconfig.yaml
    kubectl get nodes

Bootstrap

  1. TODO Setup secrets (optional)
  2. Setup Argo CD and all the applications
    make bootstrap
  3. Access Argo CD
    # username: admin
    # password: (autogenerated) the pod name of the Argo CD API server
    kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2
    
    # port forward the service
    kubectl port-forward service/argocd-server -n argocd 8080:443
    
    # from the UI
    [open|xdg-open] https://localhost:8080
    # from the CLI
    argocd login localhost:8080 --username admin
    • You might need to Allow invalid certificates for resources loaded from localhost on Chrome enabling the flag chrome://flags/#allow-insecure-localhost to access it
  4. First time only sync all the OutOfSync applications
    • manually
    • TODO with a cronjob (optional)
    • verify guestbook example
    # port forward the service
    kubectl port-forward service/guestbook-ui -n guestbook 8081:80
    # open browser
    [open|xdg-open] http://localhost:8081

This is how it should looks like on the UI

argocd-ui

Resources

Applications

Applications in this repository are defined in the parent applications chart and are logically split into folders which represent Kubernetes namespaces.

charts

ambassador namespace is dedicated for Ambassador, a lightweight Kubernetes-native microservices API gateway built on the Envoy Proxy which is mainly used for routing and supports canary deployments, traffic shadowing, rate limiting, authentication and more

# retrieve EXTERNAL-IP
kubectl get service ambassador -n ambassador
[open|xdg-open] http://<EXTERNAL-IP>/ambassador
[open|xdg-open] http://<EXTERNAL-IP>/httpbin/
[open|xdg-open] http://<EXTERNAL-IP>/guestbook

# debug ambassador
kubectl port-forward service/ambassador-admins 8877 -n ambassador
[open|xdg-open] http://localhost:8877/ambassador/v0/diag

Ambassador is disabled by default because the recommended way is to use host-based routing which requires a domain

For a working example on DigitalOcean using external-dns you can have a look at niqdev/do-k8s

TODO Service mesh

observe namespace is dedicated for observability and in the specific Monitoring, Alerting and Logging

  • prometheus-operator provides monitoring and alerting managing Prometheus, Alertmanager and Grafana

    # prometheus
    kubectl port-forward service/prometheus-operator-prometheus 8001:9090 -n observe
    
    # alertmanager
    kubectl port-forward service/prometheus-operator-alertmanager 8002:9093 -n observe
    
    # grafana
    # username: admin
    # password: prom-operator
    kubectl port-forward service/prometheus-operator-grafana 8003:80 -n observe
  • kube-ops-view provides a read-only system dashboard for multiple k8s clusters

    kubectl port-forward service/kube-ops-view -n observe 8004:80

EFK stack for logging

  • elasticsearch is a distributed, RESTful search and analytics engine and it's is used for log storage

    kubectl port-forward service/elasticsearch-master 9200:9200 -n observe
  • cerebro is an Elasticsearch web admin tool

    kubectl port-forward service/cerebro 9000:80 -n observe
  • kibana visualize and query the log data stored in an Elasticsearch index

    kubectl port-forward service/kibana-kibana 9001:5601 -n observe
  • fluentbit is a fast and lightweight Log Processor and Forwarder

  • elasticsearch-curator or curator helps to curate, or manage, Elasticsearch indices and snapshots

Resources

kube-system namespace is reserved for Kubernete system applications

  • kubernetes-dashboard is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage applications running in the cluster and troubleshoot them, as well as manage the cluster itself

    kubectl port-forward service/kubernetes-dashboard -n kube-system 8000:443
  • metrics-server is an add-on which extends the metrics api group and enables the Kubernetes resource HorizontalPodAutoscaler

    kubectl top node
    kubectl top pod --all-namespaces
  • spotify-docker-gc performs garbage collection in the Kubernetes cluster and the default configurations have the gc running once a day which:

    • removes containers that exited more than a hour ago
    • removes images that don't belong to any container
    • removes volumes that are not associated to any remaining container

TODO (not in order)

  • bump argocd to latest version
  • argocd: example secrets for private charts
  • argocd: override default admin.password
  • argocd-bootstrap: open source and explain solution of how to sync automatically first time with cronjob
  • expose argocd over http i.e. --insecure flag
  • configure TLS/cert and authentication on ambassador for all services
  • centralize auth on ambassador/istio
  • Jaeger tracing
  • kube-monkey or chaoskube
  • explain how to switch cluster via DNS
  • Kafka from public chart + JMX fix
  • stateless vs stateful: disaster recovery stratecy e.g S3 backup/restore
  • example with multiple providers: DigitalOcean, EKS, GKE
  • add prometheus adapter for custom metrics that can be used by the HorizontalPodAutoscaler
  • explain how to test a branch i.e. change target revision from the UI
  • TODO fix alertmanager: error: unrecognized log format "<nil>", try --help
  • add screenshots to readme for each app
  • explain how to add grafana dashboards with ConfigMap
  • add alerting example on Slack/PagerDuty
  • add example of prometheus ServiceMonitor + dashboard
  • explain how to init es index on kibana for logging + screenshot
  • add kubefwd to docs
  • argocd issue: Add support for secrets in Application parameters
  • argocd issue: Helm repository as first class Argo CD Application source