mariadb-operator/mariadb-operator

[Bug] Galera with maxScale is failing to recover

Closed this issue ยท 22 comments

Documentation

Describe the bug

After upgrade seemd to fail from 11.3.2 to 11.4.2, the cluster became non-working, and I don't exactly know how.

Expected behaviour
It should be able to recover.

Steps to reproduce the bug

I don't know how, as it was working fine for a while and then stopped to.

Debug information

  • Related object events:
LAST SEEN   TYPE      REASON                OBJECT                       MESSAGE
21s         Warning   RecreatingFailedPod   statefulset/mariadb-galera   StatefulSet databases/mariadb-galera is recreating failed Pod mariadb-galera-0

Environment details:

  • Kubernetes version: v1.29.0+k3s1
  • Kubernetes distribution: k3s
  • mariadb-operator version: 0.28.1
  • Install method: Helm
  • Install flavor: custom

Additional context

k logs --context nebula -n databases mariadb-galera-0
Defaulted container "mariadb" out of: mariadb, agent, init (init)
2024-06-03 07:39:34+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-06-03 07:39:34+00:00 [Note] [Entrypoint]: Initializing database files
2024-06-03  7:39:35 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.


PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !
To do so, start the server, then issue the following command:

'/usr/bin/mariadb-secure-installation'

which will also give you the option of removing the test
databases and anonymous user created by default.  This is
strongly recommended for production servers.

See the MariaDB Knowledgebase at https://mariadb.com/kb

Please report any problems at https://mariadb.org/jira

The latest information about MariaDB is available at https://mariadb.org/.

Consider joining MariaDB's strong and vibrant community:
https://mariadb.org/get-involved/

2024-06-03 07:39:35+00:00 [Note] [Entrypoint]: Database files initialized
2024-06-03 07:39:35+00:00 [Note] [Entrypoint]: Starting temporary server
2024-06-03 07:39:35+00:00 [Note] [Entrypoint]: Waiting for server startup
----
k logs --context nebula -n databases mariadb-galera-0 -c agent
{"level":"info","ts":1717400415.948036,"msg":"Starting agent"}
{"level":"info","ts":1717400415.948769,"logger":"server","msg":"server listening","addr":":5555"}
{"level":"error","ts":1717400416.7483642,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400417.748818,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400418.7486272,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400419.749036,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400420.7489088,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
2024/06/03 07:40:21 "POST http://mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local:5555/api/recovery HTTP/1.1" from 10.244.0.253:38368 - 500 56B in 4.9401536s
{"level":"error","ts":1717400421.718159,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"info","ts":1717400421.722613,"logger":"server","msg":"shutting down server"}
{"level":"error","ts":1717400422.719305,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"info","ts":1717400422.7236068,"logger":"server","msg":"graceful shutdown timed out"}
{"level":"error","ts":1717400422.7236707,"msg":"server error","error":"error shutting down server: context deadline exceeded","stacktrace":"github.com/mariadb-operator/mariadb-operator/cmd/agent.init.func1\n\t/app/cmd/agent/main.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:987\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/app/cmd/controller/main.go:380\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}

Hey there! Thanks for reporting

bootstrap: unable to find uuid and seqno: uuid= seqno=

The agent is unable to parse the grastate.dat file used to obtain the seqno and uuid of each node during the cluster recovery process. It seems like something have changed in MariaDB 11.4 that we haven't yet supported on the Galera agent side. We will look into this, but for now I suggest you stay with the previous version.

Hey there @samip5 ! Please take a look at this comment to see if this helps:

#672 (comment)

error unmarshaling bootstrap: unable to find uuid and seqno: uuid= seqno=

@samip5 could get get a copy of your grastate.dat to see why the Galera agent is unable to parse it? It is available in /var/lib/mysql/grastate.dat

error unmarshaling bootstrap: unable to find uuid and seqno: uuid= seqno=

@samip5 could get get a copy of your grastate.dat to see why the Galera agent is unable to parse it? It is available in /var/lib/mysql/grastate.dat

That might prove difficult as I'm unable to even get inside the container due to it crashlooping?

There also seems to be an issue with writing a config?

{"level":"debug","ts":1720175026.8486392,"logger":"galera.recovery.cluster","msg":"Error polling","controller":"mariadb","controllerGroup":"k8s.mariadb.com","controllerKind":"MariaDB","MariaDB":{"name":"mariadb-galera","namespace":"databases"},"namespace":"databases","name":"mariadb-galera","reconcileID":"ecd4de99-edc6-4a92-b057-c1aab0f2eab8","err":"error writing recovery config: open /etc/mysql/mariadb.conf.d/2-recovery.cnf: permission denied"}

That might prove difficult as I'm unable to even get inside the container due to it crashlooping?

Debug containers don't support volumes not just yet unfortunately. You can create a debug Pod yourself:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: mariadb-debug
spec:
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: storage-mariadb-0
  containers:
  - name: debug-container
    image: busybox
    command: [ "sleep", "infinity" ]
    volumeMounts:
    - mountPath: /var/lib/mysql
      name: data
EOF
kubectl exec -it mariadb-debug -- cat /var/lib/mysql/grastate,dat

{"level":"debug","ts":1720175026.8486392,"logger":"galera.recovery.cluster","msg":"Error polling","controller":"mariadb","controllerGroup":"k8s.mariadb.com","controllerKind":"MariaDB","MariaDB":{"name":"mariadb-galera","namespace":"databases"},"namespace":"databases","name":"mariadb-galera","reconcileID":"ecd4de99-edc6-4a92-b057-c1aab0f2eab8","err":"error writing recovery config: open /etc/mysql/mariadb.conf.d/2-recovery.cnf: permission denied"}

This is definitely something, the agent won't be able to recover the Pod if it doesn't manage to put MariaDB in recovery mode. Could you provide the following?

  • StorageClass and CSIDriver that you are using
  • Your Pod securityContext to ensure that your agent has write permissions to /etc/mysql/mariadb.conf.d/

With the default StatefulSet securityContext configuration user 999 should be able to write in /var/lib/mysql as the kubelet will change recursively the permissions on all mounts based on fsGroup. This requires a CSIDriver compatible with the fsGroup feature. Here it is a valid podSecurityContext to be set in the MariaDB resource:

 podSecurityContext:
    fsGroup: 999
    runAsGroup: 999
    runAsNonRoot: true
    runAsUser: 999

storageclass (galera-mariadb-galera-0 pvc):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: rook-ceph-cluster
    meta.helm.sh/release-namespace: rook-ceph
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2024-05-08T01:06:28Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: rook-ceph-cluster
    helm.toolkit.fluxcd.io/namespace: rook-ceph
  name: fast-ceph-filesystem
  resourceVersion: "48720"
  uid: 90ed9032-e69d-4420-b769-87b5c3d87585
parameters:
  clusterID: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: xfs
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  fsName: ceph-filesystem
  pool: ceph-filesystem-fast-data0
provisioner: rook-ceph.cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

csidriver (for the above pvc):

apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  creationTimestamp: "2024-05-08T01:08:24Z"
  name: rook-ceph.cephfs.csi.ceph.com
  resourceVersion: "52784"
  uid: ce63a7ba-47e0-4345-987a-fd5d077dec72
spec:
  attachRequired: true
  fsGroupPolicy: File
  podInfoOnMount: false
  requiresRepublish: false
  seLinuxMount: true
  storageCapacity: false
  volumeLifecycleModes:
  - Persistent

SecurityContext:

  podSecurityContext:
    runAsUser: 568
    runAsGroup: 568
    fsGroup: 568
    fsGroupChangePolicy: OnRootMismatch

error unmarshaling bootstrap: unable to find uuid and seqno: uuid= seqno=

@samip5 could get get a copy of your grastate.dat to see why the Galera agent is unable to parse it? It is available in /var/lib/mysql/grastate.dat

I don't wonder why it's a problem...

cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    00000000-0000-0000-0000-000000000000
seqno:   -1
safe_to_bootstrap: 0

The 2nd one has valid ish grastate.

cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    9ca5802e-137c-11ef-b62c-7b147f2d3c67
seqno:   -1
safe_to_bootstrap: 1

storageclass (galera-mariadb-galera-0 pvc):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: rook-ceph-cluster
    meta.helm.sh/release-namespace: rook-ceph
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2024-05-08T01:06:28Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: rook-ceph-cluster
    helm.toolkit.fluxcd.io/namespace: rook-ceph
  name: fast-ceph-filesystem
  resourceVersion: "48720"
  uid: 90ed9032-e69d-4420-b769-87b5c3d87585
parameters:
  clusterID: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: xfs
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  fsName: ceph-filesystem
  pool: ceph-filesystem-fast-data0
provisioner: rook-ceph.cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

csidriver (for the above pvc):

apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  creationTimestamp: "2024-05-08T01:08:24Z"
  name: rook-ceph.cephfs.csi.ceph.com
  resourceVersion: "52784"
  uid: ce63a7ba-47e0-4345-987a-fd5d077dec72
spec:
  attachRequired: true
  fsGroupPolicy: File
  podInfoOnMount: false
  requiresRepublish: false
  seLinuxMount: true
  storageCapacity: false
  volumeLifecycleModes:
  - Persistent

SecurityContext:

  podSecurityContext:
    runAsUser: 568
    runAsGroup: 568
    fsGroup: 568
    fsGroupChangePolicy: OnRootMismatch

The storage one is using different one for csi and storageclass, but it has support too. It's also rook but ceph-block instead.

grastate.dat looks good, this should not be the issue at this point.

CSIDriver has fsGroupPolicy: File, so it should be capable or changing the volume ownership.

As far as I know, our images should allow to change the running user from anything different from 999, but just to discard possible problems, could you try to change the podSecurityContext to?:

  podSecurityContext:
    runAsUser: 999
    runAsGroup: 999
    fsGroup: 999

Just to confirm, you are using mariadb:11.4.2, right?

cc @grooverdan just in case I am missing something regarding the images

Just to confirm, you are using mariadb:11.4.2, right?

I had rolled back to 11.3.2 after the auto-update borked it but that hadn't fixed it so it's still set to 11.3.2..

grastate.dat looks good, this should not be the issue at this point.

Are you sure? Even if the 0 mariadb-galera has the one with uuid that's zeros?

Are you sure? Even if the 0 mariadb-galera has the one with uuid that's zeros?

Yes, we are parsing the uuid here:

if _, err := guuid.Parse(g.UUID); err != nil {

and as you can se no error is returned: https://go.dev/play/p/VOK0aR5JvZG

galera-0:

k logs -n databases mariadb-galera-0
Defaulted container "mariadb" out of: mariadb, agent, init (init)
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Initializing database files
2024-07-05 12:16:08 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.


PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !
To do so, start the server, then issue the following command:

'/usr/bin/mariadb-secure-installation'

which will also give you the option of removing the test
databases and anonymous user created by default.  This is
strongly recommended for production servers.

See the MariaDB Knowledgebase at https://mariadb.com/kb

Please report any problems at https://mariadb.org/jira

The latest information about MariaDB is available at https://mariadb.org/.

Consider joining MariaDB's strong and vibrant community:
https://mariadb.org/get-involved/

2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Database files initialized
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Starting temporary server
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Waiting for server startup
2024-07-05 12:16:40+00:00 [ERROR] [Entrypoint]: Unable to start server.

galera-1:

2024-07-05 12:15:21+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-07-05 12:15:22+00:00 [Note] [Entrypoint]: MariaDB upgrade information missing, assuming required
2024-07-05 12:15:22+00:00 [Note] [Entrypoint]: MariaDB upgrade (mariadb-upgrade or creating healthcheck users) required, but skipped due to $MARIADB_AUTO_UPGRADE setting
2024-07-05 12:15:22 0 [Note] Starting MariaDB 11.3.2-MariaDB-1:11.3.2+maria~ubu2204 source revision 068a6819eb63bcb01fdfa037c9bf3bf63c33ee42 as process 1
2024-07-05 12:15:22 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2024-07-05 12:15:22 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2024-07-05 12:15:22 0 [Note] WSREP: wsrep_load(): Galera 26.4.16(r7dce5149) by Codership Oy <info@codership.com> loaded successfully.
2024-07-05 12:15:22 0 [Note] WSREP: Initializing allowlist service v1
2024-07-05 12:15:22 0 [Note] WSREP: Initializing event service v1
2024-07-05 12:15:22 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2024-07-05 12:15:22 0 [Note] WSREP: Found saved state: 9ca5802e-137c-11ef-b62c-7b147f2d3c67:-1, safe_to_bootstrap: 1
2024-07-05 12:15:22 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 9ca5802e-137c-11ef-b62c-7b147f2d3c67
Seqno: 984983 - 1038778
Offset: 85065224
Synced: 1
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 9ca5802e-137c-11ef-b62c-7b147f2d3c67, offset: 85065224
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 984983-1038778
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (        0/132880832 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: found 3/53799 locked buffers
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: free space: 1337536/134217728
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (132880832/132880832 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.244.8.106; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr
2024-07-05 12:15:22 0 [Note] WSREP: Start replication
2024-07-05 12:15:22 0 [Note] WSREP: Connecting with bootstrap option: 0
2024-07-05 12:15:22 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2024-07-05 12:15:22 0 [Note] WSREP: protonet asio version 0
2024-07-05 12:15:22 0 [Note] WSREP: Using CRC-32C for message checksums.
2024-07-05 12:15:22 0 [Note] WSREP: backend: asio
2024-07-05 12:15:22 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2024-07-05 12:15:22 0 [Note] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2024-07-05 12:15:22 0 [Note] WSREP: restore pc from disk failed
2024-07-05 12:15:22 0 [Note] WSREP: GMCast version 0
2024-07-05 12:15:22 0 [Note] WSREP: (408ad6c0-bef7, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2024-07-05 12:15:22 0 [Note] WSREP: (408ad6c0-bef7, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2024-07-05 12:15:22 0 [Note] WSREP: EVS version 1
2024-07-05 12:15:22 0 [Note] WSREP: gcomm: connecting to group 'mariadb-operator', peer 'mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local:,mariadb-galera-1.mariadb-galera-internal.databases.svc.cluster.local:'
2024-07-05 12:15:25 0 [Note] WSREP: EVS version upgrade 0 -> 1
2024-07-05 12:15:25 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2024-07-05 12:15:25 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2024-07-05 12:15:25 0 [Note] WSREP: view(view_id(NON_PRIM,408ad6c0-bef7,1) memb {
	408ad6c0-bef7,0
} joined {
} left {
} partitioned {
})
2024-07-05 12:15:25 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50173S), skipping check
2024-07-05 12:15:55 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2024-07-05 12:15:55 0 [Note] WSREP: view((empty))
2024-07-05 12:15:55 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
	 at ./gcomm/src/pc.cpp:connect():160
2024-07-05 12:15:55 0 [ERROR] WSREP: ./gcs/src/gcs_core.cpp:gcs_core_open():221: Failed to open backend connection: -110 (Connection timed out)
2024-07-05 12:15:56 0 [ERROR] WSREP: ./gcs/src/gcs.cpp:gcs_open():1674: Failed to open channel 'mariadb-operator' at 'gcomm://mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local,mariadb-galera-1.mariadb-galera-internal.databases.svc.cluster.local': -110 (Connection timed out)
2024-07-05 12:15:56 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2024-07-05 12:15:56 0 [ERROR] WSREP: wsrep::connect(gcomm://mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local,mariadb-galera-1.mariadb-galera-internal.databases.svc.cluster.local) failed: 7
2024-07-05 12:15:56 0 [ERROR] Aborting

So how do I switch the primary to be the 2nd?

Ah, it managed to figure itself out.

I had to only change the securitycontext and delete the galera pvc for the 0 one aswell as changed the timeout.

change the securitycontext

Change it to what? To use 999 user and group? This would indicate that the image might not be not compatible with non 999 user/groups

delete the galera pvc for the 0 one

Could this mean that the CSIDriver, not sure if by definition or related to the implementation (rook), only changes the permissions when provisioning the PVC?

changed the timeout.

Yeah, podRecoveryTimeout = 3m does not fit all sizes, it is recommended that you adapt it based on your cluster needs.

Change it to what? To use 999 user and group? This would indicate that the image might not be not compatible with non 999 user/groups

I changed it to 999, it more than likely means that the folder is owned in a way that users/group other than 999 cannot modify it.

Could this mean that the CSIDriver, not sure if by definition or related to the implementation (rook), only changes the permissions when provisioning the PVC?

It's not a CSIDriver thing, but k3s/containerd thing to my understanding.

I changed it to 999.

Great thanks for confirming.

It's not a CSIDriver thing, but k3s/containerd thing to my understanding.

Perhaps, need to further investigate.

Thank you very much for helping troubleshooting this, very much appreciated contributions. Are we good to close the issue?

Are we good to close the issue?

I would say yes as my workload using MariaDB started to work again. (Wordpress)

@samip5 Great! thanks for your contribution