kartoza/docker-geoserver

GeoServer Pod Restart Failure with Thread Local Errors in Kubernetes Environment

not-Karot opened this issue · 1 comments

What is the bug or the crash?

GeoServer running in a Kubernetes pod encounters startup failures after a pod restart. The primary issue involves ThreadLocal errors and WebappClassLoaderBase illegal state exceptions. This seems to be occurring specifically when the pod, which mounts a persistent volume for the GeoServer data directory, is restarted.

Steps to reproduce the issue

Deploy GeoServer on Kubernetes using the kartoza charts with this value for persistence.

persistence:
  geoserverDataDir:
    enabled: true
    existingClaim: "geoserver-data-dir-pvc"
    mountPath: /opt/persistence/data_dir
    size: 1Ti
    storageClass: "filestore-sc"
    accessModes:
      - ReadWriteMany
    annotations: {}

  geowebcacheCacheDir:
    enabled: true
    existingClaim: "geoserver-cache-dir-pvc"
    mountPath: /opt/persistence/data_dir/gwc
    size: 1Ti
    storageClass: "filestore-sc"
    accessModes:
      - ReadWriteMany
    annotations: {}

storageClassFS:
  volumeBindingMode : Immediate #WaitForFirstConsumer
  reclaimPolicy: Retain #Delete
  parameters:
    tier: standard
    network: ${network}

Attach a persistent volume to the GeoServer pod for storing the data directory.
Add workspace, stores and layers to the instance.
Kill the pod
Let k8s automatically restart the GeoServer pod.
Observe the errors in the pod logs.

Versions

GeoServer Version: 2.23.2 (but appears with any version)
Docker Image: docker.io/kartoza/geoserver:2.23.2
GCP Kubernetes Engine standard cluster
Filestore istance as persistent volume

Additional context

The problem arises specifically when restarting the pod. Initial deployment, with no data linked to geoserver instance, doesn't show these errors. The persistent volume seems to be correctly configured, and this setup worked seamlessly before. I found recommendations online suggesting a change in the default data_dir. Consequently, I have mounted my persistent volume to a different location (/opt/persistence/data_dir) and updated the container environment variables accordingly:

GEOSERVER_DATA_DIR: /opt/persistence/data_dir
GEOWEBCACHE_CACHE_DIR: /opt/persistence/data_dir/gwc

This change was expected to resolve the issue, but the startup errors persisted.

Additionally, as a solution to this problem, I am open to recommendations on best practices for maintaining the state of GeoServer when deploying on Kubernetes, ensuring horizontal scalability.

Can we. Move this to a discussion rather than issue