Improve collectd collector performance by detecting hostPath mount errors
banjoh opened this issue · 1 comments
Describe the rationale for the suggested feature.
Whenever the collectd
collector runs, it mounts /var/lib/collectd
host path. If the path does not exist, the pod get stuck in ContainerCreating
state until its forcefully terminated. This leads to a lot of time wasting. The collector runs for 90s
unnecessarily.
Describe the feature
We need to figure out how we can detect if a pod is failing due to not finding the /var/lib/collectd
directory and stop the collector it gracefully.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 59s default-scheduler Successfully assigned default/troubleshoot-copyfromhost-pgw7w-t2jsc to k3d-mycluster-server-0
Warning FailedMount 27s (x7 over 59s) kubelet MountVolume.SetUp failed for volume "host" : hostPath type check failed: /var/lib/collectd is not a directory
Additional context
This collector pod gets launched using a DaemonSet
. This means that there is a pod restart policy to consider. We do not want to have it to Never
cause there may be legitimate intermittent conditions stopping the pod from starting.
I think we can use pod event to check if it has failedMount event. Then we can terminate it immediately. I have added it to the PR