GeoServer Pod Restart Failure with Thread Local Errors in Kubernetes Environment
not-Karot opened this issue · 1 comments
What is the bug or the crash?
GeoServer running in a Kubernetes pod encounters startup failures after a pod restart. The primary issue involves ThreadLocal errors and WebappClassLoaderBase illegal state exceptions. This seems to be occurring specifically when the pod, which mounts a persistent volume for the GeoServer data directory, is restarted.
Steps to reproduce the issue
Deploy GeoServer on Kubernetes using the kartoza charts with this value for persistence.
persistence:
geoserverDataDir:
enabled: true
existingClaim: "geoserver-data-dir-pvc"
mountPath: /opt/persistence/data_dir
size: 1Ti
storageClass: "filestore-sc"
accessModes:
- ReadWriteMany
annotations: {}
geowebcacheCacheDir:
enabled: true
existingClaim: "geoserver-cache-dir-pvc"
mountPath: /opt/persistence/data_dir/gwc
size: 1Ti
storageClass: "filestore-sc"
accessModes:
- ReadWriteMany
annotations: {}
storageClassFS:
volumeBindingMode : Immediate #WaitForFirstConsumer
reclaimPolicy: Retain #Delete
parameters:
tier: standard
network: ${network}
Attach a persistent volume to the GeoServer pod for storing the data directory.
Add workspace, stores and layers to the instance.
Kill the pod
Let k8s automatically restart the GeoServer pod.
Observe the errors in the pod logs.
Versions
GeoServer Version: 2.23.2 (but appears with any version)
Docker Image: docker.io/kartoza/geoserver:2.23.2
GCP Kubernetes Engine standard cluster
Filestore istance as persistent volume
Additional context
The problem arises specifically when restarting the pod. Initial deployment, with no data linked to geoserver instance, doesn't show these errors. The persistent volume seems to be correctly configured, and this setup worked seamlessly before. I found recommendations online suggesting a change in the default data_dir. Consequently, I have mounted my persistent volume to a different location (/opt/persistence/data_dir) and updated the container environment variables accordingly:
GEOSERVER_DATA_DIR: /opt/persistence/data_dir
GEOWEBCACHE_CACHE_DIR: /opt/persistence/data_dir/gwc
This change was expected to resolve the issue, but the startup errors persisted.
Additionally, as a solution to this problem, I am open to recommendations on best practices for maintaining the state of GeoServer when deploying on Kubernetes, ensuring horizontal scalability.
Can we. Move this to a discussion rather than issue