Splunk Operator: slow mounting of ebs volume hence pod is keeping "container creating" state for too long
yaroslav-nakonechnikov opened this issue ยท 16 comments
Please select the type of request
Enhancement
Tell us more
Describe the request
as we are using EBS volumes with quite big sizes (10Tb+) for indexers, and sometimes it is requred to change node, we found that mounting of EBS and starting pod takes too much time.
In our case it is 70 minutes just to start start pod after assignment to node.
after investigation, we found that k8s by default forces persmissions. ref: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
and it takes a lot of time.
Expected behavior
In documenation it is mentioned with some examples how to solve it and crd has default value for fsGroupChangePolicy
= "OnRootMismatch"
@yaroslav-nakonechnikov just wanted to check what version of splunk operator you are using
@vivekr-splunk crd didn't changed a lot from beginning. But i'd say 2.4 and 2.5 doesn't have that feature.
@vivekr-splunk @akondur Splunk support ticket has also been raised for that matter. Please refer to the following case number "CASE [3423864]".
Hi @yaroslav-nakonechnikov , is the request here to change the fsGroupChangePolicy
to OnRootMismatch
?
request is to add support for it and inform users about potential issues with big volumes.
as a result it can be changed by default, as from my perspective it doesn't look necessary to change permissions on each mount
@yaroslav-nakonechnikov , have you tried changing the fsGroupChangePolicy
to OnRootMismatch
and check if that fixes the issue in your environment? This can be done my manully disabling the operator(temporarily) and testing it on one of your Splunk instances? We are currently evaluating the option on our end.
@akondur how? any change in statefulset/pod leads to recreate it. and crd doesn't have that option
@yaroslav-nakonechnikov You could create a simple Splunk statefulSet which attaches to EBS volumes and try reproducing the issue - post which you can change the policy to see if it changes. Alternatively before changing nodes for the pods, you could delete the operator temporarily and edit the statefulSet
@akondur in that case why you can't recheck it if you already know what and how to recheck?
i reported problem as a customer. now it is your step to get most of it and repeat for it.
Honestly, i don't understand why i have to spin another cluster with another 11Tb disks and fill it all with some dump data? Will you pay for it?
Hello @yaroslav-nakonechnikov, Thank you for investigating this issue and identifying a possible solution. We will replicate the problem on our end and test to see if your fix resolves it. we will get back to you soon on this
Hey @yaroslav-nakonechnikov , we have merged the change to update the fsGroupChangePolicy
. Please let us know if the issue still persists and we can re-visit the issue.
@akondur this is good.
so now, need to wait till it will be released.
as for now i don't know how to check it, knowing that fact that 2.5.0 and 2.5.1 also not working as expected.
@yaroslav-nakonechnikov We have reverted the change as we are going to release 2.5.2 this week. Will re-introduce it right after in develop. If this change is needed soon - we will make another minor release. Will update the PR here as soon as it's ready.
Hey @yaroslav-nakonechnikov , please find the merged MR into develop here. Please let me know if you're still facing issues with this change.
how it can be closed, if it is not released yet?