/kubeeye

KubeEye aims to find various problems on Kubernetes, such as application misconfiguration, unhealthy cluster components and node problems.

Primary LanguageOpen Policy AgentApache License 2.0Apache-2.0

KubeEye

All Contributors

kubeeye-logo

English | 中文

KubeEye aims to find various problems on Kubernetes, such as application misconfiguration(using OPA), cluster components unhealthy and node problems(using Node-Problem-Detector). Besides predefined rules, it also supports custom defined rules.

Architecture

KubeEye gets cluster diagnostic data by calling the Kubernetes API, by regular matching of key error messages in resources and by rule matching of container syntax. See Architecture for details.

kubeeye-architecture

How to use

  • Install KubeEye on your machine
    • Download pre built executables from Releases.

    • Or you can build from source code

Note: make install will create kubeeye in /usr/local/bin/ on your machine.

```shell
git clone https://github.com/kubesphere/kubeeye.git
cd kubeeye 
make install
```

Note: This line will install npd on your cluster, only required if you want detailed report.

kubeeye install -e npd
  • Run KubeEye

Note: The results of kubeeye sort by resource kind.

root@node1:# kubeeye audit
NAMESPACE     NAME              KIND          MESSAGE
default       nginx             Deployment    [nginx CPU limits should be set. nginx CPU requests should be set. nginx image tag not specified, do not use 'latest'. nginx livenessProbe should be set. nginx memory limits should be set. nginx memory requests should be set. nginx priorityClassName can be set. nginx root file system should be set read only. nginx readinessProbe should be set. nginx runAsNonRoot can be set.]
default       testcronjob       CronJob       [testcronjob CPU limits should be set. testcronjob CPU requests should be set. testcronjob allowPrivilegeEscalation should be set false. testcronjob have HighRisk capabilities. testcronjob hostIPC should not be set. testcronjob hostNetwork should not be set. testcronjob hostPID should not be set. testcronjob hostPort should not be set. testcronjob imagePullPolicy should be set 'Always'. testcronjob image tag not specified, do not use 'latest'. testcronjob have insecure capabilities. testcronjob livenessProbe should be set. testcronjob memory limits should be set. testcronjob memory requests should be set. testcronjob priorityClassName can be set. testcronjob privileged should be set false. testcronjob root file system should be set read only. testcronjob readinessProbe should be set.]
kube-system   testrole          Role          [testrole can impersonate user. testrole can delete resources. testrole can modify workloads.]
              testclusterrole   ClusterRole   [testclusterrole can impersonate user. testclusterrole can delete resource. testclusterrole can modify workloads.]

NAMESPACE     SEVERITY   PODNAME                              EVENTTIME                   REASON    MESSAGE
kube-system   Warning    vpnkit-controller.16acd7f7536c62e8   2021-10-11T15:55:08+08:00   BackOff   Back-off restarting failed container

NODENAME        SEVERITY     HEARTBEATTIME               REASON              MESSAGE
node18          Fatal        2020-11-19T10:32:03+08:00   NodeStatusUnknown   Kubelet stopped posting node status.
node19          Fatal        2020-11-19T10:31:37+08:00   NodeStatusUnknown   Kubelet stopped posting node status.
node2           Fatal        2020-11-19T10:31:14+08:00   NodeStatusUnknown   Kubelet stopped posting node status.
node3           Fatal        2020-11-27T17:36:53+08:00   KubeletNotReady     Container runtime not ready: RuntimeReady=false reason:DockerDaemonNotReady message:docker: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

NAME            SEVERITY     TIME                        MESSAGE
scheduler       Fatal        2020-11-27T17:09:59+08:00   Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0          Fatal        2020-11-27T17:56:37+08:00   Get https://192.168.13.8:2379/health: dial tcp 192.168.13.8:2379: connect: connection refused

You can refer to the FAQ content to optimize your cluster.

What KubeEye can do

  • KubeEye validates your workloads yaml specs against industry best practice, helps you make your cluster stable.
  • KubeEye can find problems of your cluster control plane, including kube-apiserver/kube-controller-manager/etcd, etc.
  • KubeEye helps you detect all kinds of node problems, including memory/cpu/disk pressure, unexpected kernel error logs, etc.

Checklist

YES/NO CHECK ITEM Description
NodeDockerHung Docker hung, you can check docker log
PrivilegeEscalationAllowed Privilege escalation is allowed
CanImpersonateUser The role/clusterrole can impersonate other user
CanDeleteResources The role/clusterrole can delete kubernetes resources
CanModifyWorkloads The role/clusterrole can modify kubernetes workloads
NoCPULimits The resource does not set limits of CPU in containers.resources
NoCPURequests The resource does not set requests of CPU in containers.resources
HighRiskCapabilities Have high-Risk options in capabilities such as ALL/SYS_ADMIN/NET_ADMIN
HostIPCAllowed HostIPC Set to true
HostNetworkAllowed HostNetwork Set to true
HostPIDAllowed HostPID Set to true
HostPortAllowed HostPort Set to true
ImagePullPolicyNotAlways Image pull policy not always
ImageTagIsLatest The image tag is latest
ImageTagMiss The image tag do not declare
InsecureCapabilities Have insecure options in capabilities such as KILL/SYS_CHROOT/CHOWN
NoLivenessProbe The resource does not set livenessProbe
NoMemoryLimits The resource does not set limits of memory in containers.resources
NoMemoryRequests The resource does not set requests of memory in containers.resources
NoPriorityClassName The resource does not set priorityClassName
PrivilegedAllowed Running a pod in a privileged mode means that the pod can access the host’s resources and kernel capabilities
NoReadinessProbe The resource does not set readinessProbe
NotReadOnlyRootFilesystem The resource does not set readOnlyRootFilesystem to true
NotRunAsNonRoot The resource does not set runAsNonRoot to true, maybe executed run as a root account
ETCDHealthStatus if etcd is up and running normally, please check etcd status
ControllerManagerHealthStatus if kubernetes kube-controller-manager is up and running normally, please check kube-controller-manager status
SchedulerHealthStatus if kubernetes kube-scheduler is up and running normally, please check kube-scheduler status
NodeMemory if node memory usage is above threshold, please check node memory usage
DockerHealthStatus if docker is up and running, please check docker status
NodeDisk if node disk usage is above given threshold, please check node disk usage
KubeletHealthStatus if kubelet is active and running normally
NodeCPU if node cpu usage is above the given threshold
NodeCorruptOverlay2 Overlay2 is not available
NodeKernelNULLPointer the node displays NotReady
NodeDeadlock A deadlock is a phenomenon in which two or more processes are waiting for each other as they compete for resources
NodeOOM Monitor processes that consume too much memory, especially those that consume a lot of memory very quickly, and the kernel kill them to prevent them from running out of memory
NodeExt4Error Ext4 mount error
NodeTaskHung Check to see if there is a process in state D for more than 120s
NodeUnregisterNetDevice Check corresponding net
NodeCorruptDockerImage Check docker image
NodeAUFSUmountHung Check storage
PodSetImagePullBackOff Pod can't pull the image properly, so it can be pulled manually on the corresponding node
PodNoSuchFileOrDirectory Go into the container to see if the corresponding file exists
PodIOError This is usually due to file IO performance bottlenecks
PodNoSuchDeviceOrAddress Check corresponding net
PodInvalidArgument Check the storage
PodDeviceOrResourceBusy Check corresponding dirctory and PID
PodFileExists Check for existing files
PodTooManyOpenFiles The number of file /socket connections opened by the program exceeds the system set value
PodNoSpaceLeftOnDevice Check for disk and inode usage
NodeApiServerExpiredPeriod ApiServer certificate expiration date less than 30 days will be checked
NodeNotReadyAndUseOfClosedNetworkConnection http2-max-streams-per-connection
NodeNotReady Failed to start ContainerManager Cannot set property TasksAccounting, or unknown property

unmarked items are under heavy development

Add your own audit rules

Add custom OPA rules

  • create a directory for OPA rules
mkdir opa
  • Add custom OPA rules files

Note: the OPA rule for workloads package name must be kubeeye_workloads_rego, for RBAC package name must be kubeeye_RBAC_rego, for nodes package name must be kubeeye_nodes_rego.

  • Save the following rule to rule file such as imageRegistryRule.rego for audit the image registry address complies with rules.
package kubeeye_workloads_rego

deny[msg] {
    resource := input
    type := resource.Object.kind
    resourcename := resource.Object.metadata.name
    resourcenamespace := resource.Object.metadata.namespace
    workloadsType := {"Deployment","ReplicaSet","DaemonSet","StatefulSet","Job"}
    workloadsType[type]

    not workloadsImageRegistryRule(resource)

    msg := {
        "Name": sprintf("%v", [resourcename]),
        "Namespace": sprintf("%v", [resourcenamespace]),
        "Type": sprintf("%v", [type]),
        "Message": "ImageRegistryNotmyregistry"
    }
}

workloadsImageRegistryRule(resource) {
    regex.match("^myregistry.public.kubesphere/basic/.+", resource.Object.spec.template.spec.containers[_].image)
}
  • Run KubeEye with custom rules

Note: Specify the path then Kubeeye will read all files in the directory that end with .rego.

root:# kubeeye audit -p ./opa -f ~/.kube/config
NAMESPACE     NAME              KIND          MESSAGE
default       nginx1            Deployment    [ImageRegistryNotmyregistry NotReadOnlyRootFilesystem NotRunAsNonRoot]
default       nginx11           Deployment    [ImageRegistryNotmyregistry PrivilegeEscalationAllowed HighRiskCapabilities HostIPCAllowed HostPortAllowed ImagePullPolicyNotAlways ImageTagIsLatest InsecureCapabilities NoPriorityClassName PrivilegedAllowed NotReadOnlyRootFilesystem NotRunAsNonRoot]
default       nginx111          Deployment    [ImageRegistryNotmyregistry NoCPULimits NoCPURequests ImageTagMiss NoLivenessProbe NoMemoryLimits NoMemoryRequests NoPriorityClassName NotReadOnlyRootFilesystem NoReadinessProbe NotRunAsNonRoot]

Contributors ✨

Thanks goes to these wonderful people (emoji key):


ruiyaoOps

💻 📖

Forest

📖

zryfish

📖

shaowenchen

📖

pixiake

📖

pengfei

📖

Harsh Thakur

💻

leonharetd

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Documents