[Bug] Galera with maxScale is failing to recover
Closed this issue ยท 22 comments
Documentation
- I acknowledge that I have read the relevant documentation.
Describe the bug
After upgrade seemd to fail from 11.3.2 to 11.4.2, the cluster became non-working, and I don't exactly know how.
Expected behaviour
It should be able to recover.
Steps to reproduce the bug
I don't know how, as it was working fine for a while and then stopped to.
Debug information
- Related object events:
LAST SEEN TYPE REASON OBJECT MESSAGE
21s Warning RecreatingFailedPod statefulset/mariadb-galera StatefulSet databases/mariadb-galera is recreating failed Pod mariadb-galera-0
mariadb-operator
: https://p.kapsi.fi/?2594480b0cd28056#Ha51MFg7CfVDsRoAcjpbVWPJWPihWrvYZaMpQ6t32wh (expires after 1 year)
Environment details:
- Kubernetes version: v1.29.0+k3s1
- Kubernetes distribution: k3s
- mariadb-operator version: 0.28.1
- Install method: Helm
- Install flavor: custom
Additional context
k logs --context nebula -n databases mariadb-galera-0
Defaulted container "mariadb" out of: mariadb, agent, init (init)
2024-06-03 07:39:34+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-06-03 07:39:34+00:00 [Note] [Entrypoint]: Initializing database files
2024-06-03 7:39:35 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !
To do so, start the server, then issue the following command:
'/usr/bin/mariadb-secure-installation'
which will also give you the option of removing the test
databases and anonymous user created by default. This is
strongly recommended for production servers.
See the MariaDB Knowledgebase at https://mariadb.com/kb
Please report any problems at https://mariadb.org/jira
The latest information about MariaDB is available at https://mariadb.org/.
Consider joining MariaDB's strong and vibrant community:
https://mariadb.org/get-involved/
2024-06-03 07:39:35+00:00 [Note] [Entrypoint]: Database files initialized
2024-06-03 07:39:35+00:00 [Note] [Entrypoint]: Starting temporary server
2024-06-03 07:39:35+00:00 [Note] [Entrypoint]: Waiting for server startup
----
k logs --context nebula -n databases mariadb-galera-0 -c agent
{"level":"info","ts":1717400415.948036,"msg":"Starting agent"}
{"level":"info","ts":1717400415.948769,"logger":"server","msg":"server listening","addr":":5555"}
{"level":"error","ts":1717400416.7483642,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400417.748818,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400418.7486272,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400419.749036,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"error","ts":1717400420.7489088,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
2024/06/03 07:40:21 "POST http://mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local:5555/api/recovery HTTP/1.1" from 10.244.0.253:38368 - 500 56B in 4.9401536s
{"level":"error","ts":1717400421.718159,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"info","ts":1717400421.722613,"logger":"server","msg":"shutting down server"}
{"level":"error","ts":1717400422.719305,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"github.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered.func1\n\t/app/pkg/galera/agent/handler/recovery.go:119\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/poll.go:33\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/galera/agent/handler/recovery.go:116\ngithub.com/mariadb-operator/mariadb-operator/pkg/galera/agent/handler.(*Recovery).Start\n\t/app/pkg/galera/agent/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/mariadb-operator/mariadb-operator/pkg/kubernetes/auth.(*KubernetesAuth).Handler-fm.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetes/auth/auth.go:73\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.init.0.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/logger.go:55\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:327\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:459\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/recoverer.go:45\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler-fm.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/middleware/compress.go:209\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2166\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.12/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039"}
{"level":"info","ts":1717400422.7236068,"logger":"server","msg":"graceful shutdown timed out"}
{"level":"error","ts":1717400422.7236707,"msg":"server error","error":"error shutting down server: context deadline exceeded","stacktrace":"github.com/mariadb-operator/mariadb-operator/cmd/agent.init.func1\n\t/app/cmd/agent/main.go:136\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:987\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/app/cmd/controller/main.go:380\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}
Hey there! Thanks for reporting
bootstrap: unable to find uuid and seqno: uuid= seqno=
The agent is unable to parse the grastate.dat
file used to obtain the seqno and uuid of each node during the cluster recovery process. It seems like something have changed in MariaDB 11.4 that we haven't yet supported on the Galera agent side. We will look into this, but for now I suggest you stay with the previous version.
error unmarshaling bootstrap: unable to find uuid and seqno: uuid= seqno=
@samip5 could get get a copy of your grastate.dat
to see why the Galera agent is unable to parse it? It is available in /var/lib/mysql/grastate.dat
error unmarshaling bootstrap: unable to find uuid and seqno: uuid= seqno=
@samip5 could get get a copy of your
grastate.dat
to see why the Galera agent is unable to parse it? It is available in/var/lib/mysql/grastate.dat
That might prove difficult as I'm unable to even get inside the container due to it crashlooping?
There also seems to be an issue with writing a config?
{"level":"debug","ts":1720175026.8486392,"logger":"galera.recovery.cluster","msg":"Error polling","controller":"mariadb","controllerGroup":"k8s.mariadb.com","controllerKind":"MariaDB","MariaDB":{"name":"mariadb-galera","namespace":"databases"},"namespace":"databases","name":"mariadb-galera","reconcileID":"ecd4de99-edc6-4a92-b057-c1aab0f2eab8","err":"error writing recovery config: open /etc/mysql/mariadb.conf.d/2-recovery.cnf: permission denied"}
That might prove difficult as I'm unable to even get inside the container due to it crashlooping?
Debug containers don't support volumes not just yet unfortunately. You can create a debug Pod
yourself:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: mariadb-debug
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: storage-mariadb-0
containers:
- name: debug-container
image: busybox
command: [ "sleep", "infinity" ]
volumeMounts:
- mountPath: /var/lib/mysql
name: data
EOF
kubectl exec -it mariadb-debug -- cat /var/lib/mysql/grastate,dat
{"level":"debug","ts":1720175026.8486392,"logger":"galera.recovery.cluster","msg":"Error polling","controller":"mariadb","controllerGroup":"k8s.mariadb.com","controllerKind":"MariaDB","MariaDB":{"name":"mariadb-galera","namespace":"databases"},"namespace":"databases","name":"mariadb-galera","reconcileID":"ecd4de99-edc6-4a92-b057-c1aab0f2eab8","err":"error writing recovery config: open /etc/mysql/mariadb.conf.d/2-recovery.cnf: permission denied"}
This is definitely something, the agent won't be able to recover the Pod
if it doesn't manage to put MariaDB in recovery mode. Could you provide the following?
StorageClass
andCSIDriver
that you are using- Your
Pod
securityContext
to ensure that your agent has write permissions to/etc/mysql/mariadb.conf.d/
With the default StatefulSet
securityContext
configuration user 999
should be able to write in /var/lib/mysql
as the kubelet will change recursively the permissions on all mounts based on fsGroup
. This requires a CSIDriver
compatible with the fsGroup
feature. Here it is a valid podSecurityContext
to be set in the MariaDB
resource:
podSecurityContext:
fsGroup: 999
runAsGroup: 999
runAsNonRoot: true
runAsUser: 999
storageclass (galera-mariadb-galera-0 pvc):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
meta.helm.sh/release-name: rook-ceph-cluster
meta.helm.sh/release-namespace: rook-ceph
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2024-05-08T01:06:28Z"
labels:
app.kubernetes.io/managed-by: Helm
helm.toolkit.fluxcd.io/name: rook-ceph-cluster
helm.toolkit.fluxcd.io/namespace: rook-ceph
name: fast-ceph-filesystem
resourceVersion: "48720"
uid: 90ed9032-e69d-4420-b769-87b5c3d87585
parameters:
clusterID: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: xfs
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
fsName: ceph-filesystem
pool: ceph-filesystem-fast-data0
provisioner: rook-ceph.cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
csidriver (for the above pvc):
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
creationTimestamp: "2024-05-08T01:08:24Z"
name: rook-ceph.cephfs.csi.ceph.com
resourceVersion: "52784"
uid: ce63a7ba-47e0-4345-987a-fd5d077dec72
spec:
attachRequired: true
fsGroupPolicy: File
podInfoOnMount: false
requiresRepublish: false
seLinuxMount: true
storageCapacity: false
volumeLifecycleModes:
- Persistent
SecurityContext:
podSecurityContext:
runAsUser: 568
runAsGroup: 568
fsGroup: 568
fsGroupChangePolicy: OnRootMismatch
error unmarshaling bootstrap: unable to find uuid and seqno: uuid= seqno=
@samip5 could get get a copy of your
grastate.dat
to see why the Galera agent is unable to parse it? It is available in/var/lib/mysql/grastate.dat
I don't wonder why it's a problem...
cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
safe_to_bootstrap: 0
The 2nd one has valid ish grastate.
cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 9ca5802e-137c-11ef-b62c-7b147f2d3c67
seqno: -1
safe_to_bootstrap: 1
storageclass (galera-mariadb-galera-0 pvc):
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: meta.helm.sh/release-name: rook-ceph-cluster meta.helm.sh/release-namespace: rook-ceph storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2024-05-08T01:06:28Z" labels: app.kubernetes.io/managed-by: Helm helm.toolkit.fluxcd.io/name: rook-ceph-cluster helm.toolkit.fluxcd.io/namespace: rook-ceph name: fast-ceph-filesystem resourceVersion: "48720" uid: 90ed9032-e69d-4420-b769-87b5c3d87585 parameters: clusterID: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/fstype: xfs csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph fsName: ceph-filesystem pool: ceph-filesystem-fast-data0 provisioner: rook-ceph.cephfs.csi.ceph.com reclaimPolicy: Delete volumeBindingMode: Immediate allowVolumeExpansion: truecsidriver (for the above pvc):
apiVersion: storage.k8s.io/v1 kind: CSIDriver metadata: creationTimestamp: "2024-05-08T01:08:24Z" name: rook-ceph.cephfs.csi.ceph.com resourceVersion: "52784" uid: ce63a7ba-47e0-4345-987a-fd5d077dec72 spec: attachRequired: true fsGroupPolicy: File podInfoOnMount: false requiresRepublish: false seLinuxMount: true storageCapacity: false volumeLifecycleModes: - PersistentSecurityContext:
podSecurityContext: runAsUser: 568 runAsGroup: 568 fsGroup: 568 fsGroupChangePolicy: OnRootMismatch
The storage one is using different one for csi and storageclass, but it has support too. It's also rook but ceph-block instead.
grastate.dat
looks good, this should not be the issue at this point.
CSIDriver
has fsGroupPolicy: File
, so it should be capable or changing the volume ownership.
As far as I know, our images should allow to change the running user from anything different from 999
, but just to discard possible problems, could you try to change the podSecurityContext
to?:
podSecurityContext:
runAsUser: 999
runAsGroup: 999
fsGroup: 999
Just to confirm, you are using mariadb:11.4.2
, right?
cc @grooverdan just in case I am missing something regarding the images
Just to confirm, you are using
mariadb:11.4.2
, right?
I had rolled back to 11.3.2 after the auto-update borked it but that hadn't fixed it so it's still set to 11.3.2..
grastate.dat
looks good, this should not be the issue at this point.
Are you sure? Even if the 0 mariadb-galera has the one with uuid that's zeros?
Are you sure? Even if the 0 mariadb-galera has the one with uuid that's zeros?
Yes, we are parsing the uuid here:
and as you can se no error is returned: https://go.dev/play/p/VOK0aR5JvZG
galera-0:
k logs -n databases mariadb-galera-0
Defaulted container "mariadb" out of: mariadb, agent, init (init)
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Initializing database files
2024-07-05 12:16:08 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !
To do so, start the server, then issue the following command:
'/usr/bin/mariadb-secure-installation'
which will also give you the option of removing the test
databases and anonymous user created by default. This is
strongly recommended for production servers.
See the MariaDB Knowledgebase at https://mariadb.com/kb
Please report any problems at https://mariadb.org/jira
The latest information about MariaDB is available at https://mariadb.org/.
Consider joining MariaDB's strong and vibrant community:
https://mariadb.org/get-involved/
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Database files initialized
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Starting temporary server
2024-07-05 12:16:08+00:00 [Note] [Entrypoint]: Waiting for server startup
2024-07-05 12:16:40+00:00 [ERROR] [Entrypoint]: Unable to start server.
galera-1:
2024-07-05 12:15:21+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-07-05 12:15:22+00:00 [Note] [Entrypoint]: MariaDB upgrade information missing, assuming required
2024-07-05 12:15:22+00:00 [Note] [Entrypoint]: MariaDB upgrade (mariadb-upgrade or creating healthcheck users) required, but skipped due to $MARIADB_AUTO_UPGRADE setting
2024-07-05 12:15:22 0 [Note] Starting MariaDB 11.3.2-MariaDB-1:11.3.2+maria~ubu2204 source revision 068a6819eb63bcb01fdfa037c9bf3bf63c33ee42 as process 1
2024-07-05 12:15:22 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2024-07-05 12:15:22 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2024-07-05 12:15:22 0 [Note] WSREP: wsrep_load(): Galera 26.4.16(r7dce5149) by Codership Oy <info@codership.com> loaded successfully.
2024-07-05 12:15:22 0 [Note] WSREP: Initializing allowlist service v1
2024-07-05 12:15:22 0 [Note] WSREP: Initializing event service v1
2024-07-05 12:15:22 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2024-07-05 12:15:22 0 [Note] WSREP: Found saved state: 9ca5802e-137c-11ef-b62c-7b147f2d3c67:-1, safe_to_bootstrap: 1
2024-07-05 12:15:22 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 9ca5802e-137c-11ef-b62c-7b147f2d3c67
Seqno: 984983 - 1038778
Offset: 85065224
Synced: 1
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 9ca5802e-137c-11ef-b62c-7b147f2d3c67, offset: 85065224
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 984983-1038778
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/132880832 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: found 3/53799 locked buffers
2024-07-05 12:15:22 0 [Note] WSREP: Recovering GCache ring buffer: free space: 1337536/134217728
2024-07-05 12:15:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (132880832/132880832 bytes) complete.
2024-07-05 12:15:22 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.244.8.106; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr
2024-07-05 12:15:22 0 [Note] WSREP: Start replication
2024-07-05 12:15:22 0 [Note] WSREP: Connecting with bootstrap option: 0
2024-07-05 12:15:22 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2024-07-05 12:15:22 0 [Note] WSREP: protonet asio version 0
2024-07-05 12:15:22 0 [Note] WSREP: Using CRC-32C for message checksums.
2024-07-05 12:15:22 0 [Note] WSREP: backend: asio
2024-07-05 12:15:22 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2024-07-05 12:15:22 0 [Note] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2024-07-05 12:15:22 0 [Note] WSREP: restore pc from disk failed
2024-07-05 12:15:22 0 [Note] WSREP: GMCast version 0
2024-07-05 12:15:22 0 [Note] WSREP: (408ad6c0-bef7, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2024-07-05 12:15:22 0 [Note] WSREP: (408ad6c0-bef7, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2024-07-05 12:15:22 0 [Note] WSREP: EVS version 1
2024-07-05 12:15:22 0 [Note] WSREP: gcomm: connecting to group 'mariadb-operator', peer 'mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local:,mariadb-galera-1.mariadb-galera-internal.databases.svc.cluster.local:'
2024-07-05 12:15:25 0 [Note] WSREP: EVS version upgrade 0 -> 1
2024-07-05 12:15:25 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2024-07-05 12:15:25 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2024-07-05 12:15:25 0 [Note] WSREP: view(view_id(NON_PRIM,408ad6c0-bef7,1) memb {
408ad6c0-bef7,0
} joined {
} left {
} partitioned {
})
2024-07-05 12:15:25 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50173S), skipping check
2024-07-05 12:15:55 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2024-07-05 12:15:55 0 [Note] WSREP: view((empty))
2024-07-05 12:15:55 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at ./gcomm/src/pc.cpp:connect():160
2024-07-05 12:15:55 0 [ERROR] WSREP: ./gcs/src/gcs_core.cpp:gcs_core_open():221: Failed to open backend connection: -110 (Connection timed out)
2024-07-05 12:15:56 0 [ERROR] WSREP: ./gcs/src/gcs.cpp:gcs_open():1674: Failed to open channel 'mariadb-operator' at 'gcomm://mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local,mariadb-galera-1.mariadb-galera-internal.databases.svc.cluster.local': -110 (Connection timed out)
2024-07-05 12:15:56 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2024-07-05 12:15:56 0 [ERROR] WSREP: wsrep::connect(gcomm://mariadb-galera-0.mariadb-galera-internal.databases.svc.cluster.local,mariadb-galera-1.mariadb-galera-internal.databases.svc.cluster.local) failed: 7
2024-07-05 12:15:56 0 [ERROR] Aborting
So how do I switch the primary to be the 2nd?
Ah, it managed to figure itself out.
I had to only change the securitycontext and delete the galera pvc for the 0 one aswell as changed the timeout.
change the securitycontext
Change it to what? To use 999
user and group? This would indicate that the image might not be not compatible with non 999
user/groups
delete the galera pvc for the 0 one
Could this mean that the CSIDriver
, not sure if by definition or related to the implementation (rook), only changes the permissions when provisioning the PVC?
changed the timeout.
Yeah, podRecoveryTimeout = 3m
does not fit all sizes, it is recommended that you adapt it based on your cluster needs.
Change it to what? To use
999
user and group? This would indicate that the image might not be not compatible with non999
user/groups
I changed it to 999, it more than likely means that the folder is owned in a way that users/group other than 999 cannot modify it.
Could this mean that the
CSIDriver
, not sure if by definition or related to the implementation (rook), only changes the permissions when provisioning the PVC?
It's not a CSIDriver thing, but k3s/containerd thing to my understanding.
I changed it to 999.
Great thanks for confirming.
It's not a CSIDriver thing, but k3s/containerd thing to my understanding.
Perhaps, need to further investigate.
Thank you very much for helping troubleshooting this, very much appreciated contributions. Are we good to close the issue?
Perhaps, need to further investigate.
Possibly related: https://github.com/kubevirt/containerized-data-importer/blob/main/doc/block_cri_ownership_config.md
Are we good to close the issue?
I would say yes as my workload using MariaDB started to work again. (Wordpress)