splunk/splunk-operator

Splunk Operator: splunk_indexer : Remove existing HEC token results in failed indexer pod startup

Opened this issue · 1 comments

Please select the type of request

Bug

Tell us more

Describe the request

  • I'm upgrading from splunk operator 2.2.0 to version 2.5.2, and also attempting to use Splunk 9.1.4.

Expected behavior

  • The indexer pods should start without an error

Splunk setup on K8S

  • Multisite cluster with cluster manager

Reproduction/Testing steps

  • When the pods are upgraded they throw an error, rolling back the splunk image to version docker.io/splunk/splunk:9.0.3-a2 stops the issue occurring so I'm unsure if this is an issue in the docker image or operator or a combination.

Additional context(optional)
In the operator I used:

image:
  repository: docker.io/splunk/splunk:9.1.4

splunkOperator:
  enabled: true
  clusterWideAccess: true

  # Specify volumes for Splunk Operator pod, append additional volumes to list
  # reference: https://kubernetes.io/docs/concepts/storage/volumes/
  volumes:
  - name: app-staging
    persistentVolumeClaim:
      claimName: splunk-operator-app-download

  # Specify volume mounts for the manager container, append additional volume mounts to list
  # reference: https://kubernetes.io/docs/tasks/configure-pod-container/configure-volume-storage/
  volumeMounts:
  - mountPath: /opt/splunk/appframework/
    name: app-staging

The logs show:

│ TASK [splunk_indexer : Remove existing HEC token] ******************************                                                                                                                                │
│ fatal: [localhost]: FAILED! => {                                                                                                                                                                                │
│     "changed": false,                                                                                                                                                                                           │
│     "elapsed": 0,                                                                                                                                                                                               │
│     "redirected": false,                                                                                                                                                                                        │
│     "status": -1,                                                                                                                                                                                               │
│     "url": "https://127.0.0.1:8089/services/data/inputs/http/splunk_hec_token",                                                                                                                                 │
│     "warnings": [                                                                                                                                                                                               │
│         "Module did not set no_log for password"                                                                                                                                                                │
│     ]                                                                                                                                                                                                           │
│ }                                                                                                                                                                                                               │
│                                                                                                                                                                                                                 │
│ MSG:                                                                                                                                                                                                            │
│                                                                                                                                                                                                                 │
│ Status code was -1 and not [200, 404]: Request failed: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1091)>                   │
│                                                                                                                                                                                                                 │
│ PLAY RECAP *********************************************************************                                                                                                                                │
│ localhost                  : ok=82   changed=8    unreachable=0    failed=1    skipped=61   rescued=0    ignored=0                                                                                              │

I've tested adding SSL certificates into the deployment without success so far.
The cluster manager pod doesn't seem to have an issue here, only the indexer pods

Under defaults: I tested:

    config:
      env:
        verify: false

And also setting SSL config via;

defaults:
  splunk:
    ssl:
      ca: /mnt/peers-splunk-ca/tls.crt
      cert: /mnt/peers-splunk-cert/tls.crt

Without any success

We believe this to be due to the verify flag in the underlying splunk-ansible configuration steps. Open PR https://github.com/splunk/splunk-ansible/pull/818/files is to address this issue.