/kotary

Managing Kubernetes Quota with confidence

Primary LanguageGoMIT LicenseMIT



Managing Kubernetes Quota with confidence

Kotary

GitHub Workflow Status GitHub go.mod Go version Go Report Card Docker Pulls

Table of Contents

What is it ?

It is an operator that brings a layer of verification and policy to the native ResourceQuotas mechanism. It introduced a new resource call a ResourceQuotaClaims that will let users ask to modify the specification of their quota. The verification includes :

  • There are enough resources (CPU and Memory) on the cluster to allocate the claim which will look at the total amount of resources of the cluster (worker node) and the sum of all the other ResourceQuotas
  • (Optional) It respects the maximum bound value express as a ratio of the cluster resource, ex: a namespace cannot claim more that a 1/3 of the cluster
  • (Optional) In order to have some flexibility it is possible to set an over-commit or under-commit ratio to set what is claimable compared to the actual resources. ex: In a development environment you could choose to allow reserving more resources than what is actually usable in reality.

In order to facilitate the adaption of ResourceQuotaClaims it is possible to enforce a default claim for namespaces. The feature will be activated on namespace that contains the label quota=managed.

Why not use an admission controller ?

It could have been an elegant solution to use the admission controller mechanism in Kubernetes. This would have avoided the use of a Custom Resource Definition by directly asking to modify a ResourceQuotas. In the meantime this would have left out users on managed cluster like EKS, AKS or GKE, this is why we implemented the operator pattern instead.

Adding or Scaling-Up a Quota

How Kotary verification process work when you add a claim or request a scale-up

Installation

Add the CRD

kubectl apply -f https://raw.githubusercontent.com/ca-gip/kotary/master/artifacts/crd.yml

Configuration

Options
Name Description Mandatory Type Default
defaultClaimSpec Default claim that will be added to a watched Namespace no ResourceList cpu:2
memory: 6Gi
ratioMaxAllocationMemory Maximum amount of Memory claimable by a Namespace no Float 1
ratioMaxAllocationCPU Maximum amount of CPU claimable by a Namespace no Float 1
ratioOverCommitMemory Memory over-commitment no Float 1
ratioOverCommitCPU CPU over-commitment no Float 1
Example

In the following sample configuration we set :

  • A default claim of 2 CPU and 10Gi of Memory
  • 33% of total amount of resource can be claim by a namespace
  • An over-commit of 130%
cat <<EOF | kubectl -n kube-system create -f -
apiVersion: v1
kind: ConfigMap
data:
  defaultClaimSpec: |
    cpu: "2"
    memory: "10Gi"
  ratioMaxAllocationMemory: "0.33"
  ratioMaxAllocationCPU: "0.33"
  ratioOverCommitMemory: "1.3"
  ratioOverCommitCPU: "1.3"
metadata:
  name: kotary-config
EOF

Deployment

Deploy the controller
kubectl apply -f https://raw.githubusercontent.com/ca-gip/kotary/master/artifacts/deployment.yml
(Optional) Deploy the service monitor
kubectl apply -f https://raw.githubusercontent.com/ca-gip/kotary/master/artifacts/metrics.yml

Getting Started

Update a ResourceQuota

To update a ResourceQuotas you will have to create a ResourceQuotaClaims with specification for CPU and Memory. You can use the same units as the one available in Kubernetes, please refer to the official documentation

Example

cat <<EOF | kubectl apply -n demo-ns -f -
apiVersion: cagip.github.com/v1
kind: ResourceQuotaClaim
metadata:
  name: demo
spec:
  memory: 20Gi
  cpu: 5
EOF

Status

After creating a ResourceQuotaClaims there are three possibilities:

  • Accepted : The claim will be deleted, and the modifications are applied to the ResourceQuota
  • Rejected : It was not possible to accept the modification the claim show a status "REJECTED" with details.
  • Pending : The claim is requesting less resources than what is currently requested on the namespace, the claim will be accepted once it's possible to downscale
Example of a rejected claim
$ kubectl get quotaclaim
NAME   CPU   RAM    STATUS     DETAILS
demo   5     20Gi   REJECTED   Exceeded Memory allocation limit claiming 20Gi but limited to 18Gi
Example of a pending claim
$ kubectl get quotaclaim
NAME   CPU   RAM    STATUS     DETAILS
demo   5     16Gi   PENDING    Awaiting lower CPU consumption claiming 16Gi but current total of CPU request is 18Gi

Default claim

If you are using the default claim policy, namespace will automatically receive a claim and if all the verifications pass a managed-quota will be applied.

$ kubectl get resourcequota
NAME            CREATED AT
managed-quota   2020-01-24T08:31:32Z

Plan

Implementing ResourceQuota when you already have running workload on your cluster can be a tedious task. To help you get started you can use our cli kotaplan. It will enable to test various scenarios beforehand by simulating how the quota could be implemented according to the desired settings.

Here is a quick example

$ kotaplan -label quota=managed -cpuover 0.95 -memover 0.95 -memratio 0.33 -cpuratio 0.33
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Namespaces Details                                                                                                                                                                    |
+--------------+------+---------+-----------------+---------------+---------+-----------------+---------------+-------------+---------------------------+-------------------------------+
| NAMESPACE    | PODS | MEM REQ | CURRENT MEM USE | MEM REQ USAGE | CPU REQ | CURRENT CPU USE | CPU REQ USAGE | FIT DEFAULT | RESPECT MAX ALLOCATION NS | SPEC                          |
+--------------+------+---------+-----------------+---------------+---------+-----------------+---------------+-------------+---------------------------+-------------------------------+
| team-1-dev   |   14 | 7GiB    | 859.4MiB        | 11.9897 %     |    2800 |               6 | 0.2143 %      | false       | true                      | CPU : 3360m    MEM: 8.4GiB    |
| team-1-int   |   14 | 7GiB    | 852MiB          | 11.8868 %     |    2800 |               0 | 0 %           | false       | true                      | CPU : 3360m    MEM: 8.4GiB    |
| team-1-prd   |   16 | 8GiB    | 1.125GiB        | 14.0568 %     |    3200 |               0 | 0 %           | false       | true                      | CPU : 3840m    MEM: 9.6GiB    |
| team-2-dev   |   16 | 8GiB    | 1.119GiB        | 13.9935 %     |    3200 |               0 | 0 %           | false       | true                      | CPU : 3840m    MEM: 9.6GiB    |
| team-2-dev   |    8 | 4GiB    | 531.4MiB        | 12.9745 %     |    1600 |               0 | 0 %           | false       | true                      | CPU : 1920m    MEM: 6GiB      |
| team-2-dev   |   28 | 9GiB    | 1.033GiB        | 11.4748 %     |    3600 |               6 | 0.1667 %      | false       | true                      | CPU : 4320m    MEM: 10.8GiB   |
| team-3-dev   |    0 | 0B      | 0B              | 0 %           |       0 |               0 | 0 %           | true        | true                      | CPU : 1000m    MEM: 6GiB      |
| team-3-prd   |    0 | 0B      | 0B              | 0 %           |       0 |               0 | 0 %           | true        | true                      | CPU : 1000m    MEM: 6GiB      |
+--------------+------+---------+-----------------+---------------+---------+-----------------+---------------+-------------+---------------------------+-------------------------------+
| 8            |   96 |         |                 |               |         |                 |               |             |                           | CPU : 22692M    MEM: 64.8 GIB |
+--------------+------+---------+-----------------+---------------+---------+-----------------+---------------+-------------+---------------------------+-------------------------------+
+-------------------------------------------------------------+
| Summary                                                     |
+-------------------------------------------------------------+
| Number of nodes               8                             |
| Available resources (real)    CPU : 64000m    MEM: 250.5GiB |
| Available resources (commit)  CPU : 60800m    MEM: 238GiB   |
| Max per NS                    CPU : 21120m    MEM: 82.67GiB |
+-------------------------------------------------------------+
| RESULT                        OK                            |
+-------------------------------------------------------------+

Manage

To help you manage effectively your ResourceQuotas you can use the provided Granafa dashboard. You will be able to set it up according to your configuration by modifying the dashboard variable.

Global

The global section will enable users to check the current running configuration (manual) to size accordingly their claims. It also shows what is currently available, reserved and claimable in terms of resources.

grafana_global

Namespaces

This section list all the managed namespaces to give a rough idea of what is currently use by running containers. It allows to rapidly checks which quota should be increased or decreased.

grafana_namespaces

Namespace Details

This section shows a detailed view of the Memory and CPU consumption on a particular namespace. It allows to visually check what the specifications of the quota, the total amount of request made by containers and their real consumption.

grafana_namespace_details


Developed with
drawing

Versioning

Since version v1.24.0, we have decided to modify the naming of versions for ease of reading and understanding. Example: v1.24.0 means that the operator was developed for Kubernetes version 1.24 and that the last 0 corresponds to the various patches we have made to the operator.