A CLI for efficient use of Kubernetes Cluster resources for distributed processing of time-consuming task processing
GoMIT
kubetest
A CLI for efficient use of Kubernetes Cluster resources for distributed processing of time-consuming task processing.
This tool is developed based on the following concept.
Distributed processing: divide time-consuming tasks based on certain rules, and efficient use of cluster resources by processing each task using different pods
One container per task: since the divided tasks are processed in different containers, they are less affected by the processing of different tasks.
Installation
$ go install github.com/goccy/kubetest/cmd/kubetest
How to use
Usage:
kubetest [OPTIONS]
Application Options:
-n, --namespace= specify namespace (default: default)
--in-cluster specify whether in cluster
-c, --config= specify local kubeconfig path. ( default: $HOME/.kube/config )
--list= specify path to get the list for test
--log-level= specify log level (debug/info/warn/error)
--dry-run specify dry run mode
--template= specify template parameter for testjob file
-o, --output= specify output path of report
Help Options:
-h, --help Show this help message
1. Run simple task
First, We will introduce a sample that performs the simplest task processing.
Describe the manifest file of task processing as follows and execute it by passing it as an argument of kubetest CLI.
If you've already written a Kubernetes Job, you've probably noticed that the spec under the mainStep of simplest example is the same as using a Kubernetes Job :)
You'll want to use versioned data and code by git when processing tasks.
In kubetest, you can write the repository definition in repos and specify it in volumes. The repository defined in volumes can be mounted in any container by using volumeMounts like emptyDir .
You can also use a private repository with kubetest.
You can define GitHub personal token or token by GitHub App in tokens .
GitHub persoanl token data or GitHub App key data are managed by Kubernetes Secrets.
kubetest get token by referring to them.
By describing the name of the token to be used in the definition of private repository in the form of token: github-app-token, the repository will be cloned using that token.
In addition, the token can be mounted on any path using volumeMounts by writing the following in volumes. By combining this with prestep, which will be described later, you can devise so that you do not need a token when processing the main task. This makes task processing more secure.
If there is any pre-processing required before performing the main task processing, you can define it in preSteps and pass only the processing result to the subsequent tasks.
By making effective use of this step, the pre-processing required for each distributed process can be limited to one time, and the resources of the cluster can be used efficiently.
Since multiple preSteps can be defined and executed in order, the result of the previous step can be used to execute the next step.
The artifacts created by preStep can be reused in the subsequent task processing by describing the container name and path where the artifacts exists in artifacts spec.
If you want to use the already created artifacts, you can write the name of the defined artifact in volumes as follows. As with the repository, you can use volumeMounts to mount it on any path.
Describes the distributed processing, which is the main feature of kubetest.
Distributed processing is realized by defining a distributed key and passing that value as an environment variable to different tasks.
The distributed key can be determined statically or dynamically.
In the following, we will explain using the static determination pattern.
Describe the definition of distributed execution under strategy as described above.
key defines the name of the environment variable to be referenced as the distribution key and the value of the distribution key itself.
In this example, if you refer to the environment variable named TASK_KEY, you can get one of the values from TASK_KEY_1 to TASK_KEY_3.
After that, define a command that uses the value of this environment variable in spec.template.spec.containers[].command.
In strategy.scheduler, define the resources such as Pod and Container used for distributed execution.
In this example, maxContainersPerPod is 10, which means that up to 10 containers can be launched per Pod, and maxConcurrentNumPerPod is also 10, which means that 10 containers can process tasks at the same time per Pod.
Since the number of distributed keys is 3, only one Pod will be launched, but if the number of distributed keys exceeds 10, two Pods will be launched and processed.
Similarly, if you set the number of maxContainersPerPod to 1, only one container will be started per Pod, so three Pods will be started and processed.
Use strategy.key.source.dynamic to create a distributed key dynamically.
The distributed key is the output result of the command defined here divided by the line feed character. ( There is also a way of splitting and a method of filtering unnecessary output results )