FYI - Simple remedy system designed for use with NPD

Question

FYI - Simple remedy system designed for use with NPD

negz opened this issue 7 years ago · 6 comments

Hello,

I wanted to bring Draino to your attention, in case it's useful to others. Draino is a very simple 'remedy' system for permanent problems detected by the Node Problem Detector - it simply cordons and drains nodes exhibiting configurable Node Conditions.

At Planet we run a small handful of Kubernetes clusters on GCE (not GKE). We have a particular analytics workload that is really good at killing GCE persistent volumes. Without going into too much detail, we see persistent volume related processes (mkfs.ext4, mount, etc) hanging forever in uninterruptible sleep, preventing the pods wanting to consume said volumes from running. We're working with GCP to resolve this issue, but in the meantime we got tired of manually cordoning and draining affected nodes, so we wrote Draino.

Our remedy system looks like:

Detect permanent node problems and set Node Conditions using the Node Problem Detector.
Configure Draino to cordon and drain nodes when they exhibit the NPD's KernelDeadlock condition, or a variant of KernelDeadlock we call VolumeTaskHung.
Let the Cluster Autoscaler scale down underutilised nodes, including the nodes Draino has drained.

It's worth noting that once the Descheduler supports descheduling pods based on taints Draino could be replaced by the Descheduler running in combination with the scheduler's TaintNodesByCondition functionality.

Answer 1 · 2018-09-01T05:17:52.000Z

@negz This is a quite good use case for NPD. Will learn about what you said detailedly later. Would you mind to add your use case of NPD to the usage case section in ReadMe.

Answer 2 · 2018-09-01T05:21:46.000Z

This is quite what NPD is first proposed to do. Because the remedy system is end user dependent, common remedy system is not so easily developed.

Answer 3 · 2018-09-04T19:07:33.000Z

@andyxning Thanks! I'd be happy to mention this use case in the README. Would it be too self-promotional to link to our Draino tool there? :)

Answer 4 · 2018-09-05T02:39:35.000Z

@negz No. Draino is actually an POC of a remedy system based on NPD. :)

Could you please make a PR to add the use case?

Answer 5 · 2018-09-05T07:53:06.000Z

@negz I have read Draino code briefly. It seems quite good and absolutely worth a use case of NPD. Please do not hesitate to add the Draino use case. I am willing to review it. :)

Answer 6 · 2020-08-31T08:35:57.000Z

Hello, using draino, the permanent problem detected by the node problem detector -- it simply blocks and drains the node that behaves as a drainable node condition,
When a node appears an NPD's kernel deadlock condition, or a variant of a kernel deadlock known as VolumeTaskHung, configuring drain to lock and drain nodes

Here is my example shown below. I blocked for more than 300 seconds by echoing an echo "task docker:7 SEC." | systemd-cat-t kernel

The drain causes my kernel error and the rule KernelDeadlock True, but the draino doesn't work together, making my node set as unschedulable. Is this the wrong item

This is my runtime environment

# kubectl get po -A |egrep  'node-problem-detector|draino'
kube-system   draino-58fc699f84-br2m2                     1/1     Running   0          17m
kube-system   node-problem-detector-smjw7                 1/1     Running   0          18m

My KernelDeadlock True has triggered the rule, but the draino seems to drain

# for node in `kubectl get node |sed '1d' |awk '{print $1}'`;do kubectl describe node $node |sed -n '/Conditions/,/Ready/p' ;done
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  KernelDeadlock       True    Sun, 30 Aug 2020 13:49:54 +0800   Sun, 30 Aug 2020 13:39:52 +0800   DockerHung                   task docker:7 blocked for more than 300 seconds.
  NetworkUnavailable   False   Tue, 25 Aug 2020 13:39:47 +0800   Tue, 25 Aug 2020 13:39:47 +0800   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Sun, 30 Aug 2020 13:49:54 +0800   Tue, 25 Aug 2020 13:39:10 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sun, 30 Aug 2020 13:49:54 +0800   Tue, 25 Aug 2020 13:39:10 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sun, 30 Aug 2020 13:49:54 +0800   Tue, 25 Aug 2020 13:39:10 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sun, 30 Aug 2020 13:49:54 +0800   Thu, 27 Aug 2020 07:17:14 +0800   KubeletReady                 kubelet is posting ready status

The draino isn't working, setting my node to unlarged and expelling my pod

# kubectl get events -n kube-system | grep -E '(^LAST|draino)'
LAST SEEN   TYPE     REASON              OBJECT                            MESSAGE
<unknown>   Normal   Scheduled           pod/draino-58fc699f84-br2m2       Successfully assigned kube-system/draino-58fc699f84-br2m2 to master
18m         Normal   Pulling             pod/draino-58fc699f84-br2m2       Pulling image "planetlabs/draino:5e07e93"
18m         Normal   Pulled              pod/draino-58fc699f84-br2m2       Successfully pulled image "planetlabs/draino:5e07e93"
18m         Normal   Created             pod/draino-58fc699f84-br2m2       Created container draino
18m         Normal   Started             pod/draino-58fc699f84-br2m2       Started container draino
18m         Normal   SuccessfulCreate    replicaset/draino-58fc699f84      Created pod: draino-58fc699f84-br2m2
18m         Normal   ScalingReplicaSet   deployment/draino                 Scaled up replica set draino-58fc699f84 to 1

# kubectl get no
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   5d    v1.18.0