/environment

A production like environment for developers, devops, qa, secops, sysadmin, ops and SREs to test applications in production scenario.

Primary LanguageJsonnet

Important: If any commands require sudo privileges and your user don't have passwordless sudo enabled, copy the commands from makefile and run in your favorite shell.

Important: This was setup as proof-of-concept of a production system. For local development purpose please use 1 control-plane and 2 worker node configuration and RAM usage will be under control (assuming the system has atleast 5gb ram available).

Important: For any custom configurations rename .env.template to .env and use it with Makefile. Look for commands with custom name.

Important: The gitlab omnibus dockerized installation provided here is for demonstration purpose only, not suitable for production use. Harden gitlab.rb settings and ensure encryption and ssl before setting up a standalone gitlab omnibus deployment.

Install Docker

Install Docker with,

make install-docker

Install GO (optional, only necessary if you want to build kind node images)

Install go with,

make install-go

Set path in bashrc or zshrc,

export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME/go

Install KinD

Install KinD with,

make install-kind

Private image registry

If you are bootstrapping KinD cluster with more than 100 docker.io image pulls in a span of 6 hours, you'll hit docker pull limit (since image is being pulled anonymously so 200 pulls per logged in session won't apply). Another case is you may want to load custom images directly into cluster without going through a docker image registry. Note imagePullPolicy settings and it shouldn't be Always or images shouldn't use latest tag.

In this case easiest solution is pull all docker images to local pc and load into kind cluster with,

kind load docker-image IMAGE_NAME:TAG

The longest and safest (I trust you to NOT use self signed cert and distribute them using kind-config.yaml in any kind of production environment) in long run is to host a private registry with Harbor and host all necessary images in it. If you have patience to upload all necessary images for your cluster to run in private registry then Congratulations!! you are one step closer to creating an air-gapped secure cluster. Add the domain name for Harbor setup against your ip (not localhost) in /etc/hosts file.

The commands are given in order from Harbor folder, please update with your own value if needed,

make harbor-cert
make harbor-download
make harbor-yml
make harbor-prepare
make harbor-install

Stop and start Harbor containers if needed,

make harbor-down
make harbor-up

Update private_repo variable in ```.env`` and run following command to pull, tag and push necessary docker images to your private registry,

make cluster-private-images

To use images from private image registry, look for commands with custom mode.

Update nfs_share variable in ```.env`` and copy certificates to nfs share so pods like jenkins can trust it for docker related operations,

make create-cert-nfs-dir
make copy-cert-nfs

Build KinD node-image

Download kubernetes source code with, (will take few minutes)

make download-k8s-source

Build custom node image and tag it with your private image registry. (optional TODO) Add more packages to your node images if necessary.

make build-node-image

Create KinD cluster

If you are not using private image registries like harbor, create KinD cluster with,

make cluster-create

For any custom settings, private image registry, copy cluster/kind-config-custom.yaml.template to cluster/kind-config-custom.yaml and update it with your certificate and key name, mount point. Update apiServerAddress and apiServerPort with your current ip address and any port to expose the cluster.

Important: KinD cluster should not be exposed publicly. This settings are not suitable for any production environment. Please be aware of security concerns before exposing local KinD cluster publicly.

networking:
  disableDefaultCNI: true
  apiServerAddress: "YOUR_IP"
  apiServerPort: YOUR_PORT
  extraMounts:
    - containerPath: /etc/ssl/certs
      hostPath: harbor/certs
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.localdomain.com:9443"]
    endpoint = ["https://harbor.localdomain.com:9443"]
  [plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.localdomain.com".tls]
      cert_file = "/etc/ssl/certs/harbor.localdomain.com.cert"
      key_file  = "/etc/ssl/certs/harbor.localdomain.com.key"

Run following first,

make custom-mode

Create cluster with,

make cluster-create-custom

Delete KinD cluster

Destroy KinD cluster with, (NFS storage contents won't be deleted)

delete-cluster

:TLDR:

If you are lazy like me and don't want to go through reading all these commands

For private image registry (setup harbor first), copy custom cluster config and certificates in above steps, setup everything except gitlab and jenkins,

make all-custom -i

For dockerhub and public image repositories, setup everything except gitlab jenkins,

TODO

Regenerate kubeconfig

With every system reboot the exposed api server endpoint and certificate in kubeconfig will change. Regenerate kubeconfig of current cluster for kubectl with,

This will not work for HA settings. The haproxy loadbalancer container don't get certificate update this way. Copying api address ip and certificate over to loadbalancer docker container process is still TODO. For HA KinD cluster you have to destroy cluster every time before shutdown and recreate it later.

make kubectl-config

Common Troubleshooting:

If cluster creation process is taking a long time at "Starting control-plane" step and exits with error similar to,

The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

It means you probably have some physical or virtual network settings that KinD is not working with. For example kvm bridge network requires you to use a bridge network and bridge slave network based on the physical network interface. KinD does not support this scenario. After reverting to default network connection based on physical network device it completed the setup process.

Kubernetes version 1.21 node images were used to setup this cluster. Provide your customized name in makefile commands for create and delete cluster section. Clusters with same name can't exist.

If you need to use different version kubernetes node image, be aware of kubernetes feature gates and their default value according to version. If any feature gate default value is true, KinD config doesn't support setting it true again using cluster config yaml files. For example TTLAfterFinished is true by default in 1.21 but false in previous versions. So specifying it as true again for 1.21 cluster version in featureGates section in cluster/kind-config.yaml won't work.

If docker restarts for any reason please look if loadbalancer container is autostarted. Otherwise you can't regenerate kubeconfig for kubectl in case it is unable to connect to kind cluster.

Create cluster network

Create cluster network using CNI manifests,

make cluster-network

Here Calico manifest is used with BGP peering and pod CIDR 192.168.0.0/16 settings. For updated version or any change in manifest, download from,

curl https://docs.projectcalico.org/manifests/calico.yaml -O

All Calico pods must be running before installing other components in cluster. If you want to use different CNI, download the manifest and replace filename in makefile.

Run following command to let calico manifest pull from private registry,

make cluster-network-custom

If pod description shows error like x509: certificate signed by unknown authority make sure your domain and ca certificates are available inside KinD nodes (docker containers) and containerd CRI can access them.

If pod description shows error like liveness and readiness probes failed make sure any pod ip is not overlapping your LAN network ip range.

Delete cluster network

Delete Calico CNI with,

make cluster-network-delete

On custom mode,

cluster-network-custom-delete

Install NFS server

If NFS server isn't installed run command to install and configure NFS location,

make install-nfs-server

Add your location with this format in /etc/exports file,

YOUR_NFS_PATH *(rw,sync,no_root_squash,insecure,no_subtree_check)

Restart NFS server to apply changes,

sudo systemctl restart nfs-server.service

Create NFS storage class, Metallb loadbalancer, dashboard, metric server, serviceaccount

k8s-sigs.io/nfs-subdir-external-provisioner storage provisioner is used to better simulate production scenario where usually log, metric, data storage are centralized and retained even if containers get destroyed and rescheduled.

Rename nfs-deploy.yaml.template to nfs-deploy.yaml and update following values with your own, (make sure folder write permission is present)

YOUR_NFS_SHARE_PATH
YOUR_NFS_SERVER_IP

Metallb loadbalancer is used to simulate production scenario where different services will be assigned ip addresses or domain names from cloud based loadbalancer services. On premises this is generally handled by a loadbalancer like haproxy which loadbalances and routes traffic to appropriate nodes. Metallb loadbalancer is not strictly required to run the stack, simple nodeport service will work for development purpose as well.

Rename metallb-config.yaml.template to metallb-config.yaml and update following values with your own,

IP_RANGE_START
IP_RANGE_END

Kubernetes dashboard, metrics server and cluster admin role serviceaccount manifests are added. Please don't use this serviceacount for anything remotely related to production systems.

Apply the manifest files using kustomization,

make cluster-config

For custom images, assuming you already pushed images with proper tags in your private registry, (make sure you made updates to custom folder files)

make cluster-config-custom

Access dashboard using proxy and service account token,

kubectl proxy
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
make get-token

Delete NFS storage class, Metallb loadbalancer, dashboard, metric server, serviceaccount

Delete the manifest files using kustomization,

make cluster-config-delete

On custom mode,

make cluster-config-custom-delete

Apply Elasticsearch-Fluentd-Kibana (EFK) log management system manifests

EFK stack is used without ssl configuration and custom index, filter, tag rewrite rules. This is to simulate logging scenario in production environment. Custom configration can be applied to fluentd daemonset using configmap. Maybe in future a generic config file will be included. Elasticsearch runs as statefulset and as long as they are not deleted using manifest files from cluster, they will retain data in NFS share location and persist between any pod restart or reschedule. Kibana runs on nodeport 30003 so make sure to enable the port from any control-plane node in KinD cluster config.

Apply manifests with,

make cluster-logging

On custom mode with private image registry

make cluster-logging-custom

Delete Elasticsearch-Fluentd-Kibana (EFK)

Delete EFK with, (Persistant volume will be renamed with prefix archieved and data will not be available unless copied manually to new volumes)

make cluster-logging-delete

On custom mode,

make cluster-logging-custom-delete

Apply Prometheus-Grafana monitoring system

Prometheus, grafana, alertmanager and custom CRDs associated with them are exactly taken as is from kube-prometheus project (https://github.com/prometheus-operator/kube-prometheus). Please note the kubernetes compatibility matrix and download appropriate release for your version. This system uses release-0.8. Before applying manifests, go to manifests/grafana-service.yaml and add nodeport to service.

Rename pv.yaml.template to pv.yaml and update following values with your own, (make sure folder write permission is present)

YOUR_NFS_SHARE_PATH
YOUR_NFS_SERVER_IP

Create a new file with following contents manifests/grafana-credentials.yaml to have persistent admin:admin@123 credentials applied if grafana pod is restarted.

apiVersion: v1
kind: Secret
metadata:
  name: grafana-credentials
  namespace: monitoring
data:
  user: YWRtaW4=
  password: YWRtaW5AMTIz

Add env in manifests/grafana-deployment.yaml to use persistent credentials,

        env:
        - name: GF_SECURITY_ADMIN_USER
          valueFrom:
            secretKeyRef:
              name: grafana-credentials
              key: user
        - name: GF_SECURITY_ADMIN_PASSWORD
          valueFrom:
            secretKeyRef:
              name: grafana-credentials
              key: password

Replace following section in manifests/grafana-deployment.yaml with next one,

      - emptyDir: {}
        name: grafana-storage
      - name: grafana-storage
        persistentVolumeClaim:
          claimName: grafana-storage-pv-claim

Apply setup prerequisites with,

make cluster-monitoring-setup

Apply manifests with,

make cluster-monitoring

For private image registry in custom mode,

make cluster-monitoring-setup-custom
make cluster-monitoring-custom

Delete Prometheus-Grafana monitoring system

Delete prometheus, grafana, alertmanager and custom CRDs with,

make cluster-monitoring-delete
make cluster-monitoring-uninstall

For custom mode,

make cluster-monitoring-custom-delete
make cluster-monitoring-custom-uninstall

Service mesh

Istio is used as service mesh. Install istioctl operator with,

make cluster-istioctl-install

Create istio-system namespace and install istio core components with demo profile. Modify istioctl install for enabling any other modules or configurations,

make cluster-istio-install

Install istio components with private image registry,

make cluster-istio-custom-install

Optional: Enable addons

Apply grafana, prometheus, kiali, jaeger manifests to trace service communication and see service mesh metrics. Expose dashboards grafana and kiali dashboards to nodeport. In istio/samples/addons/grafana.yaml update grafana service with following,

spec:
  type: NodePort
  ports:
    - name: service
      port: 3000
      protocol: TCP
      targetPort: 3000
      nodePort: 30004

In istio/samples/addons/kiali.yaml update kiali service with following,

spec:
  type: NodePort
  ports:
  - name: http
    protocol: TCP
    port: 20001
    nodePort: 30005
  - name: http-metrics
    protocol: TCP
    port: 9090

Apply manifests with, (if any error comes up for first run, please run it again)

make cluster-istio-addons

For private image registries, apply the service port changes in files first, then run following,

make custom-mode
make cluster-istio-custom-addons

If any error comes up for first run, apply manifests again with,

make cluster-istio-custom-addons-apply

Delete istio service mesh

Delete istio components, addons and custom CRDs with,

make cluster-istio-delete

For custom mode,

make cluster-istio-custom-delete

Helm install (not used)

To install Helm v3 run the following to install the operator and then run helm repo add repo_name repo_address to add repo and helm install name repo_name,

make cluster-helm-install

Gitlab install

Set gitlab CE dockerized installation environment variable by renaming docker-compose.yml.template to docker-compose.yml in gitlab folder. Set your ip, ssh, http ports. From gitlab folder run following,

make gitlab-up

To view progress log,

make log

To check status, running processes, enter container,

make check-status
make check-system
make shell

To stop, start, restart gitlab,

make stop
make start
make restart

To get initial admin account root password or set it up,

make get-root-pass
make set-root-pass

Log into gitlab with http://YOUR_IP:3080 and user root and password from above commands. If it is hosted on non-private ip, disable new user sign up from admin panel.

To take full backup,

make before-backup take-backup after-backup

Delete gitlab

Delete and remove gitlab with

make gitlab-down
sudo rm -rf YOUR_NFS_SHARE_PATH/gitlab/*

Jenkins install

Rename pv.yaml.template to pv.yaml in jenkins folder and update following values with your own, (make sure folder write permission is present)

YOUR_NFS_SHARE_PATH
YOUR_NFS_SERVER_IP

If you don't want to execute docker related operation (build, run etc), remove container docker from containers section in jenkins/jenkins-deploy.yaml. You can also declare separate docker:dind based deployment and service manifest, mention the cluster dns for that DinD service in DNS: env variable and replace DOCKER_HOST value with it in jenkins/jenkins-deploy.yaml.

Important: Docker DinD as a separate service randomly closes connection as testing is found. If no other pod is going to use DinD service, it is recommended to attach it as sidecar of Jenkins container.

Install jenkins in cluster with,

make cluster-jenkins

Install jenkins in cluster with custom image derived from LTS version in private image registry,

In Jenkins folder,

make build

In root folder,

make custom-mode
make cluster-jenkins-custom

Get jenkins initial password to login with admin account,

make get-jenkins-token

Get jenkins service account access token for jenkins pipeline,

make get-jenkins-sa-token

If you want to install jenkins standalone with docker-compose,

In Jenkins folder,

make build
make jenkins-up
make log
make get-token

Detail steps on how to create CI pipelines in jenkins are in readme of jenkins folder.

Delete jenkins

Delete jenkins from cluster,

make cluster-jenkins-delete

For custom mode,

make cluster-jenkins-custom-delete

For docker-compose in jenkins folder,

make jenkins-down

To delete jenkins data,

sudo rm -rf YOUR_NFS_SHARE_PATH/jenkins/*

MinIO

AWS S3 like object bucket storage is provided by MinIO. Storage class is k8s-nfs with NFS share backend with MinIO operator and custom tenants.minio.min.io CRD. The old way and probably useful for most development cases can be found at https://github.com/kubernetes/examples/tree/master/staging/storage/minio which provides both standalone and statefulset examples. For this repo the MinIO operator is used from https://github.com/minio/operator. The init.yaml is generated first and applied and then tenant.yaml file is generated.

kubectl minio init --namespace minio-operator -o > minio/init.yaml
kubectl create namespace minio
kubectl minio tenant create minio --servers 1 --volumes 4 --capacity 200Gi --namespace minio --storage-class k8s-nfs -o > minio/tenant.yaml

You can go ahead and modify login credentials in minio-creds-secret secret for local use in tenant.yaml. Deploy manifests with,

make cluster-minio

For custom install with private image registry,

make cluster-minio-custom

Start minio console in localhost temporarily with, (exposing tenant crd console service to nodeport permanently TODO)

kubectl port-forward service/minio-console 9443:9443 --namespace minio

You can also generate manifest files from helm chart with,

helm template minio --namespace minio-operator --create-namespace minio/minio-operator --output-dir minio

Delete MinIO

Delete MinIO from cluster,

make cluster-minio-delete

For custom mode,

make cluster-minio-custom-delete

ArgoCD

TODO

Vault

TODO

K10 or Velero

TODO

Shipa

TODO

SAST/DAST tool integration

TODO

Test automation

TODO

Full gitops pipeline

TODO

Cilium & Hubble

Will be explored later as it conflicts with coredns pods in calico cni

DNSUtils

Apply dnsutils manifest with,

make dnsutils

For custom install with private image registry,

make dnsutils-custom

Use dnsutils to check service availability for cluster internal or external dns addresses,

kubectl exec -i -t dnsutils -- nslookup kubernetes.default
kubectl exec -i -t dnsutils -- nslookup jenkins.jenkins.svc.cluster.local

Delete DNSUtils

Delete dnsutils from cluster,

make dnsutils-delete

For custom mode,

make dnsutils-custom-delete