splunk/splunk-operator

Splunk Operator: controller doesn't detect if statefulset deleted for standalone type

yaroslav-nakonechnikov opened this issue · 10 comments

Please select the type of request

Bug

Tell us more

Describe the request
if there are some changes done in stateful set for standalone instance - controlled doesn't recreate it as it is done for other types

Expected behavior
controller creates statefulset from crd

$ kubectl get standalone
NAME     PHASE   DESIRED   READY   AGE
c-prod                             23m

in description there was warning:
Warning validateStandaloneSpec 20m splunk-standalone-controller validate standalone spec failed invalid Volume Name for App Source: custom. volume: csh, doesn't exist

but, it shouldn't be the case, as before deleting statefulset it was working

can you add CR spec here, we will retry on our end and let you know

broken example, which creates warning and blocking creating resources:

 appRepo:
    appInstallPeriodSeconds: 90
    appSources:
    - location: custom/
      name: custom
      scope: local
      volumeName: csh
    - location: base/
      name: base
      scope: local
      volumeName: csh
    appsRepoPollIntervalSeconds: 120
    installMaxRetries: 3
    volumes:
    - endpoint: https://s3-eu-central-1.amazonaws.com
      name: splunkapps
      path: bucketname/splunk-apps/csh
      provider: aws
      region: eu-central-1
      storageType: s3

correct example:

 appRepo:
    appInstallPeriodSeconds: 90
    appSources:
    - location: custom/
      name: custom
      scope: local
      volumeName: splunkapps
    - location: base/
      name: base
      scope: local
      volumeName: splunkapps
    appsRepoPollIntervalSeconds: 120
    installMaxRetries: 3
    volumes:
    - endpoint: https://s3-eu-central-1.amazonaws.com
      name: splunkapps
      path: bucketname/splunk-apps/csh
      provider: aws
      region: eu-central-1
      storageType: s3

so, all in all, expectation that warning doesn't block anything, but feature just not working.
or, it should be error - and in that case it will be expected.

Hi @yaroslav-nakonechnikov , in every custom resource's reconcile logic we do a validation of its spec and error out if any CR spec is erroneous(same for all CRs).

For standalone we validate its spec here. Another example of an indexer cluster here.

As mentioned here already, Splunk operator records a Warning event in the standalone CR indicating incorrect volume for the app source custom. Currently, only the following event types: {Normal, Warning} are supported as mentioned in the golang documentation here. In the future, if they introduce an {Error} type we will make sure to mark the event as an error.

Additionally we log an error here to indicate an erroneous config.

Please let us know if the explanation above is acceptable.

Hi @yaroslav-nakonechnikov , please let us know if the explanation above is acceptable.

explanation doesn't fix problem.

i expect that controller overrides stateful set (and or recreate it) in case of any changes in statefulset.

otherwise, behavior is not consistent and leads to additional issues, which are not expected.

Hey @yaroslav-nakonechnikov , currently we are looking to set the Phase as Error in the CR status in such scenarios to indicate that the CR spec is invalid i.e

NAME     PHASE   DESIRED   READY   AGE
c-prod   Error                     23m

In the future we are looking to add a Message field in the CR status to indicate such errors i.e

NAME     PHASE   DESIRED   READY   AGE   MESSAGE
c-prod   Error                     23m   validateStandaloneSpec: validate standalone spec failed invalid Volume Name for App Source: custom. volume: csh, doesn't exist

Please let us know if that solution works.

yes, adding nice log messages will help a lot!

Hey @yaroslav-nakonechnikov , can we close this issue as we have added a message in the CR status?

yes, for now good to be closed