splunk/splunk-operator

Smartstore: Pod Role goes offline frequently

satellite-no opened this issue · 2 comments

Please select the type of request

Bug

Tell us more

Describe the request
Smartstore quits working with below message.

10-30-2023 17:03:32.735 +0000 ERROR S3Client [119958 FilesystemOpExecutorWorker-16193] - command=list transactionId=0x7f7d86056000 rTxnId=0x7f79dafe7540 status=completed success=N uri=https://deepwatch-gotrhythm-smartstore.s3.amazonaws.com/ statusCode=400 statusDescription="Bad Request" payload="<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>ExpiredToken</Code><Message>The provided token has expired.</Message><Token-0>FwoGZXIvYXdzEDEaDDHMzuESfqpjCOaaZSKNBKi4yuxd47r5c0oX2x8GvKXVlnMU3Pgw8LNRhK0dGIAZyzZeH+rlcqVVWMKeT/IxJb6DD1Of2YOe+M5vTkLej4zU3UjpMBXBRbkPrifbAl6wJ+ZLA4gi8kcbg8M1Nuo9SD4jnFErCVGjQqo+cMZJwyA6PTyZSODQba3WdnNSKzrgkvQ6oSBsi0rCc4asRm8v+yL0mZzdIXnQ50haguzRIRi7vuMfRl8xMREslwPrlxGSc6LyJQ3Ww8OM01m7GbnbuZ/hangtXq+51u3DgtJkdmnYt671OeYHDMyRAapbBBUkHkfRo29GQppnT9Ca4lkVmLm6/eLcwskhEy3kJftM8GArzwgsYQfYF2O9q8Am/VFKKtoawjrwTUUBIpTBAVBbBzrca6SrYbr/95RUD7ARosd1WuUqyTiS/fCIfCwmbgT839kMxM2MCKvC6Dlke0fPqOuxEFFH5v0eTitzgqD1JagXSVMPGL1w4W5oDOFZzrCFvCPHFECQet2XhUu7yR8gOisCw+4VRauLB74FYWnXsHzsKWlYhztS4b03FP8hbETmplgAVfa1Jql6IVJxX+8pM17aWQyqdPu8nki02FBqjmKzmdJBUqw/RLmc5aov6q7RwHEW45atlYRqCCPXzLVRJ0TYFWtYw1VoxPwigIWFahwTRJNOeAV2tu7eqlu8gBpoKOV6116MpOtEIjhWgii3w9+pBjIqOGa9gOVtdeT6jU8hINNWlTM9uzKZgGs5VfEDGud7veme1Az/b9hfqfzL</Token-0><RequestId>2SJ37N63FJCZYV93</RequestId><HostId>xiWHSHJCC+imMuuJ1CbESlr/nJ3LgvZ5C"--

Expected behavior
Splunk should be able to see an expired AWS token for a role and update the token.

Splunk setup on K8S
Standalone instance on version v9.0.5 and operator version 2.7.0

Reproduction/Testing steps

  • Setup Splunk to use Pod role to authenticate to smartstore and allow it to work for a few days.

K8s environment

  • Useful information about the K8S environment being used. Eg. version of K8s, kind of K8s cluster etc..

Proposed changes(optional)

  • NA

K8s collector data(optional)

Additional context(optional)

  • Add any other context about the problem here.

Looks as though this is happening every 24hrs on our cluster. This is a major issues! @sgontla Any idea as to cause or workaround.

@satellite-no Thanks for bringing this to our attention, we will take a look at it!