Old shinyproxy pods not deleted by the operator due to operator exceptions being thrown.
Closed this issue · 6 comments
I scoured the shinyproxy operator config docs and after not finding an option to terminate old shinyproxy pods I discovered the shinyproxy operator itself was throwing errors.
2 errors stand out to me from the following shinyproxy operator logs:
Caught an exception while processing event ShinyProxyEvent(eventType=CHECK_OBSOLETE_INSTANCES...
: This explains why we have a huge list of old shinyproxy pods which are not being terminated.
12:25:02.137 [atcher-worker-1] WARN eu.op.sh.co.ShinyProxyController - Caught an exception while processing event ShinyProxyEvent(eventType=CHECK_OBSOLETE_INSTANCES, shinyProxy=null, shinyProxyInstance=null, retried=false). [Attempt 5/5] Not re-processing this event.
# Describing the operator, seeing all linked shinyproxy pods.
...
Hash Of Spec: fbcc16f401d576da8b2b59094f6e07dc5723eefa
Is Latest Instance: false
Hash Of Spec: 01d0beee33c612b92bc28560febac80eecff246c
Is Latest Instance: false
Hash Of Spec: c0eb2bfac0dd0c1cf426286af180c9d6a8efd294
Is Latest Instance: false
Hash Of Spec: fcbfd4f1d429657a7c4565984bab55506af9b06c
Is Latest Instance: false
Hash Of Spec: 6db6fea3f15e7e20fcbdc7e29ff2ee47adde9a12
Is Latest Instance: false
Hash Of Spec: 29dfbe853cec465bf90ecdb7eba12196b64381f7
Is Latest Instance: false
Hash Of Spec: 43ead635d242be06cbb38cf4e4eaf99f00c9c621
Is Latest Instance: false
Hash Of Spec: afd6e70fbd4e59dabe9ee1a4f37fb0751532fdae
Is Latest Instance: false
Hash Of Spec: 5f9ff7a8c138de919edf8b8ddc26c42fd4da2d65
Is Latest Instance: false
Hash Of Spec: 4483e8889ff5dcab3a529bdc5520d69564b298ac
Is Latest Instance: false
Hash Of Spec: 58c35bd8e50f6847fcdd02addc2980dc33f1be9b
Is Latest Instance: false
Hash Of Spec: 31db954a008227c2e278373255935b8f9bb65de3
Is Latest Instance: false
Hash Of Spec: f2d0dc369c242d61a0f4ff71a1d40d94205554fd
Is Latest Instance: false
Hash Of Spec: ae0cbd081dcb643192a7e881e8e2491a435ec091
Is Latest Instance: true
sp-shinyproxy-dev-rate-rs-01d0beee33c612b92bc28560febac80epr4xl 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-0ca11437d19750ea62d19140dee8a5524txlj 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-0ce152a1e5b1ae58847301446d3e552dxh64w 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-11e8c0c49732350a93c8f3d7ebdb5859tnjvr 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-1dd6de6442a299b3a63e9eb69618bec555scq 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-1eb397919124a3b1cfb58243474c5f45s479f 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-2291893b87cd2e51e647c60ea11c7855ct9vd 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-25637d13e8dccb8b3020e5e721fc1f105zz57 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-29dfbe853cec465bf90ecdb7eba12196lfjkg 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-2b54f4584de5bd5c2b8c9f84760f5cb9vslxx 1/1 Running 0 7d2h
...
sp-shinyproxy-dev-rate-rs-c0eb2bfac0dd0c1cf426286af180c9d6rnbc6 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-ca8fd4b2134bfe66ece29318aa93076bwsgdz 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-d61b2d614f50f15d6c81ea61ce272ccb7r9hb 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-e93de208664caa110095f15b84119a22ls5zb 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-f2d0dc369c242d61a0f4ff71a1d40d945czjf 1/1 Running 0 42m
sp-shinyproxy-dev-rate-rs-f8659de8fbaf3db67d8348368930e800x2c5d 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-fbcc16f401d576da8b2b59094f6e07dc772b8 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-fcbfd4f1d429657a7c4565984bab5550b4tvp 1/1 Running 0 7d2h
sp-shinyproxy-dev-rate-rs-fd1c132a1e8e2273d53a05e632c43a4cngldp 1/1 Running 0 7d2h
- Unrecognized field "timestamp": Some health checks that aren't properly configured?
12:25:05.168 [atcher-worker-1] WARN eu.op.sh.co.ShinyProxyController - Caught an exception while processing event. [Attempt 3/5]
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "timestamp" (class eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker$Response), not marked as ignorable (2 known properties: "isRecyclable", "activeConnections"])
at [Source: (String)"{"timestamp":"2023-12-12T12:25:05.168+00:00","status":404,"error":"Not Found","path":"/actuator/recyclable"}"; line: 1, column: 109] (through reference chain: eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker$Response["timestamp"])
at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperties(BeanDeserializerBase.java:1650) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:539) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:351) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:184) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629) ~[shinyproxy-operator.jar:2.0.0]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker.checkServer(RecyclableChecker.kt:82) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.RecyclableChecker.isInstanceRecyclable(RecyclableChecker.kt:54) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.checkForObsoleteInstances(ShinyProxyController.kt:293) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.receiveAndHandleEvent$tryReceiveAndHandleEvent(ShinyProxyController.kt:109) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.receiveAndHandleEvent(ShinyProxyController.kt:118) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController$receiveAndHandleEvent$1.invokeSuspend(ShinyProxyController.kt) ~[shinyproxy-operator.jar:2.0.0]
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.DispatchedTaskKt.resume(DispatchedTask.kt:178) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:166) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:397) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.CancellableContinuationImpl.completeResume(CancellableContinuationImpl.kt:513) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.channels.AbstractChannel$ReceiveElement.completeResumeReceive(AbstractChannel.kt:907) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.channels.ArrayChannel.offerInternal(ArrayChannel.kt:83) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.channels.AbstractSendChannel.send(AbstractChannel.kt:134) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.scheduleAdditionalEvents(ShinyProxyController.kt:307) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController.access$scheduleAdditionalEvents(ShinyProxyController.kt:43) ~[shinyproxy-operator.jar:2.0.0]
at eu.openanalytics.shinyproxyoperator.controller.ShinyProxyController$scheduleAdditionalEvents$1.invokeSuspend(ShinyProxyController.kt) ~[shinyproxy-operator.jar:2.0.0]
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678) ~[shinyproxy-operator.jar:2.0.0]
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665) ~[shinyproxy-operator.jar:2.0.0]
12:25:05.177 [atcher-worker-1] WARN eu.op.sh.co.ShinyProxyController - Caught an exception while processing event. [Attempt 4/5]
apiVersion: openanalytics.eu/v1alpha1
kind: ShinyProxy
metadata:
name: shinyproxy-dev
namespace: dev
spec:
spring:
session:
store-type: redis
redis:
password: <REDACTED>
sentinel:
master: shinyproxy
password: <REDACTED>
nodes: <REDACTED>,<REDACTED>,<REDACTED>
management:
endpoints:
web:
exposure:
include: info,health,beans,prometheus,metrics
metrics:
export:
prometheus:
enabled: true
server:
secureCookies: true
frameOptions: sameorigin
forward-headers-strategy: native
servlet:
multipart:
max-file-size: 50MB
max-request-size: 50MB
logging:
file:
name: shinyproxy.log
level:
io.undertow: DEBUG
eu.openanalytics: DEBUG
org.springframework: DEBUG
proxy:
store-mode: Redis
stop-proxies-on-shutdown: false
title: Development
logoUrl: ""
landing-page: /
heartbeat-rate: 10000 #in miliseconds
heartbeat-timeout: 60000 #in miliseconds
container-wait-time: 60000 #in miliseconds
default-proxy-max-lifetime: 1440 #in minutes
port: 8080
authentication: openid
openid:
auth-url: https://<REDACTED>/oauth2/v2.0/authorize
token-url: https://<REDACTED>/oauth2/v2.0/token
jwks-url: https://<REDACTED>/discovery/v2.0/keys
client-id: <REDACTED>
client-secret: <REDACTED>
username-attribute: email
roles-claim: roles
usage-stats-url: micrometer
container-backend: kubernetes
kubernetes:
internal-networking: true
namespace: dev
pod-wait-time: 600000 #in milliseconds
image-pull-policy: IfNotPresent
image-pull-secrets:
- name: docker
template-path: ./templates
template-groups:
- id: demo
properties:
display-name: DEMO
specs: []
kubernetesPodTemplateSpecPatches: |
- op: add
path: /spec/containers/0/env/-
value:
name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis
key: redis-password
- op: add
path: /spec/containers/0/env/-
value:
name: dev
valueFrom:
secretKeyRef:
name: secret
key: dev
- op: replace
path: /spec/containers/0/livenessProbe
value:
failureThreshold: 2
httpGet:
path: /actuator/health/liveness
port: 9090
scheme: HTTP
periodSeconds: 1
initialDelaySeconds: 140
successThreshold: 1
timeoutSeconds: 1
- op: replace
path: /spec/containers/0/readinessProbe
value:
failureThreshold: 2
httpGet:
path: /actuator/health/readiness
port: 9090
scheme: HTTP
periodSeconds: 1
initialDelaySeconds: 140
successThreshold: 1
timeoutSeconds: 1
- op: add
path: /spec/volumes/-
value:
name: shinyproxy-templates-dev
persistentVolumeClaim:
claimName: shinyproxy-templates-dev
- op: add
path: /spec/containers/0/volumeMounts/-
value:
mountPath: "/opt/shinyproxy/templates"
name: shinyproxy-templates-dev
- op: add
path: /spec/containers/0/resources
value:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 0.5
memory: 1Gi
- op: add
path: /spec/serviceAccountName
value: default
kubernetesIngressPatches: |
- op: add
path: /metadata/annotations
value:
nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/affinity: cookie
nginx.ingress.kubernetes.io/proxy-read-timeout: "420"
nginx.ingress.kubernetes.io/proxy-send-timeout: "420"
nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
nginx.ingress.kubernetes.io/proxy-body-size: 5000m
cert-manager.io/cluster-issuer: sectigo
- op: add
path: /spec/ingressClassName
value: nginx
- op: add
path: /spec/tls
value:
- hosts:
- dev.example.com
secretName: dev-tls
image: example.com/openanalytics/shinyproxy:3.0.2
imagePullPolicy: Always
image-pull-secrets:
- name: docker
replicas: 1
fqdn: dev.example.com
If there's anything I can try, let me know.
@cdenneen No. I have not had the time to properly imvestigate. I was hoping you guys had an easy solution or some tips. It's getting more urgent though, non of the shiny pods are killed upon upgrade, so it's beginning to get obscene how many pods that are just running and wasting money. I'll have a look when I find the time or priorities change.
@leynebe I don't work on this project. Just user like yourself. If you find solution let me know. Otherwise only thing I can think of is a CronJob to determine and kill lingering pods. I'm actually having more of an issue with the proxied apps staying around. I have stop-proxies-on-shutdown: false
so I do expect to see them staying around for a bit but I tested and when I clicked same app again it spun up new pod for it rather than the one running for 4hrs
Hi @leynebe
I looked into this and found that this is being caused by the following part of your configuration:
management:
endpoints:
web:
exposure:
include: info,health,beans,prometheus,metrics
This property tells Spring which actuator (management) endpoints to expose. The default ShinyProxy config includes an endpoint called recyclable
(https://github.com/openanalytics/containerproxy/blob/2c71c88a0f8a8f71e2551343e09b659c6f11c1fe/src/main/java/eu/openanalytics/containerproxy/ContainerProxyApplication.java#L342) which is used by the operator to check whether a ShinyProxy instance still has active websocket connections.
Usually there is no need to configure this option, so I would advice to just remove it. If you do need this option, it would be useful to know the reason, such that we can either better cover this use-case or document it on the website.
Once you remove this option, the old instances will not be removed automatically. You can either re-deploy ShinyProxy (by removing the custom resource and re-creating it), or you can manually remove the old instances from the ShinyProxy resource, using:
kubectl edit shinyproxy <crd_name> -n <namespace> --subresource='status'
Next you need to remove the old replicasets and configmaps created by shinyproxy.
Usually there is no need to configure this option, so I would advice to just remove it. If you do need this option, it would be useful to know the reason, such that we can either better cover this use-case or document it on the website.
I will try adding this default option back. Thanks for the research and explanation!!