[Bug] Adding Node to Galera cluster fails
Closed this issue · 4 comments
Documentation
- I acknowledge that I have read the relevant documentation.
Describe the bug
When scaling up an existing cluster that is running Galera, the node fails to join and then goes into a crash loop.
Expected behaviour
Steps to reproduce the bug
- Create a new Galera cluster
- Add some data to it
- Scale it up, or delete the PVCs associated with one of the nodes
New node in Galera cluster just crashing
2024-04-23 19:48:02+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-04-23 19:48:02+00:00 [Warn] [Entrypoint]: /sys/fs/cgroup///memory.pressure not writable, functionality unavailable to MariaDB
2024-04-23 19:48:02+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2024-04-23 19:48:02+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.3.2+maria~ubu2204 started.
2024-04-23 19:48:03+00:00 [Note] [Entrypoint]: MariaDB upgrade information missing, assuming required
2024-04-23 19:48:03+00:00 [Note] [Entrypoint]: MariaDB upgrade (mariadb-upgrade or creating healthcheck users) required, but skipped due to $MARIADB_AUTO_UPGRADE setting
2024-04-23 19:48:03 0 [Note] Starting MariaDB 11.3.2-MariaDB-1:11.3.2+maria~ubu2204 source revision 068a6819eb63bcb01fdfa037c9bf3bf63c33ee42 as process 1
2024-04-23 19:48:03 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2024-04-23 19:48:03 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2024-04-23 19:48:03 0 [Note] WSREP: wsrep_load(): Galera 26.4.16(r7dce5149) by Codership Oy <info@codership.com> loaded successfully.
2024-04-23 19:48:03 0 [Note] WSREP: Initializing allowlist service v1
2024-04-23 19:48:03 0 [Note] WSREP: Initializing event service v1
2024-04-23 19:48:03 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2024-04-23 19:48:03 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2024-04-23 19:48:03 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 003b272f-01a9-11ef-beba-026f3aace78f
Seqno: -1 - -1
Offset: -1
Synced: 0
2024-04-23 19:48:03 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 003b272f-01a9-11ef-beba-026f3aace78f, offset: -1
2024-04-23 19:48:03 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2024-04-23 19:48:03 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2024-04-23 19:48:03 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 675-1409
2024-04-23 19:48:03 0 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/132302552 bytes) complete.
2024-04-23 19:48:03 0 [Note] WSREP: Recovering GCache ring buffer: found 7/742 locked buffers
2024-04-23 19:48:03 0 [Note] WSREP: Recovering GCache ring buffer: free space: 1916680/134217728
2024-04-23 19:48:03 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (132302552/132302552 bytes) complete.
2024-04-23 19:48:03 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.233.91.138; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0
2024-04-23 19:48:03 0 [Note] WSREP: Start replication
2024-04-23 19:48:03 0 [Note] WSREP: Connecting with bootstrap option: 0
2024-04-23 19:48:03 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2024-04-23 19:48:03 0 [Note] WSREP: protonet asio version 0
2024-04-23 19:48:03 0 [Note] WSREP: Using CRC-32C for message checksums.
2024-04-23 19:48:03 0 [Note] WSREP: backend: asio
2024-04-23 19:48:03 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2024-04-23 19:48:03 0 [Note] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2024-04-23 19:48:03 0 [Note] WSREP: restore pc from disk failed
2024-04-23 19:48:03 0 [Note] WSREP: GMCast version 0
2024-04-23 19:48:03 0 [Note] WSREP: (65c7b769-a4b9, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2024-04-23 19:48:03 0 [Note] WSREP: (65c7b769-a4b9, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2024-04-23 19:48:03 0 [Note] WSREP: EVS version 1
2024-04-23 19:48:03 0 [Note] WSREP: gcomm: connecting to group 'mariadb-operator', peer 'mariadb-gnew-0.mariadb-gnew-internal.mariadb.svc.newcluster.local:,mariadb-gnew-1.mariadb-gnew-internal.mariadb.svc.newcluster.local:,mariadb-gnew-2.mariadb-gnew-internal.mariadb.svc.newcluster.local:'
2024-04-23 19:48:03 0 [Note] WSREP: (65c7b769-a4b9, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.233.91.138:4567
2024-04-23 19:48:03 0 [Note] WSREP: (65c7b769-a4b9, 'tcp://0.0.0.0:4567') connection established to 1f792853-896d tcp://10.233.101.184:4567
2024-04-23 19:48:03 0 [Note] WSREP: (65c7b769-a4b9, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2024-04-23 19:48:04 0 [Note] WSREP: EVS version upgrade 0 -> 1
2024-04-23 19:48:04 0 [Note] WSREP: declaring 1f792853-896d at tcp://10.233.101.184:4567 stable
2024-04-23 19:48:04 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2024-04-23 19:48:04 0 [Note] WSREP: Node 1f792853-896d state prim
2024-04-23 19:48:04 0 [Note] WSREP: view(view_id(PRIM,1f792853-896d,22) memb {
1f792853-896d,0
65c7b769-a4b9,0
} joined {
} left {
} partitioned {
})
2024-04-23 19:48:04 0 [Note] WSREP: save pc into disk
2024-04-23 19:48:04 0 [Note] WSREP: discarding pending addr without UUID: tcp://10.233.69.10:4567
2024-04-23 19:48:04 0 [Note] WSREP: gcomm: connected
2024-04-23 19:48:04 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2024-04-23 19:48:04 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2024-04-23 19:48:04 0 [Note] WSREP: Opened channel 'mariadb-operator'
2024-04-23 19:48:04 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2024-04-23 19:48:04 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2024-04-23 19:48:04 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 66165298-01aa-11ef-91b2-43347ae01c3b
2024-04-23 19:48:04 0 [Note] WSREP: STATE EXCHANGE: got state msg: 66165298-01aa-11ef-91b2-43347ae01c3b from 0 (mariadb-gnew-2)
2024-04-23 19:48:04 0 [Note] WSREP: Initializing config service v1
2024-04-23 19:48:04 1 [Note] WSREP: Starting rollbacker thread 1
2024-04-23 19:48:04 2 [Note] WSREP: Starting applier thread 2
2024-04-23 19:48:04 0 [Note] WSREP: STATE EXCHANGE: got state msg: 66165298-01aa-11ef-91b2-43347ae01c3b from 1 (mariadb-gnew-1)
2024-04-23 19:48:04 0 [Note] WSREP: Quorum results:
version = 6,
component = PRIMARY,
conf_id = 20,
members = 1/2 (joined/total),
act_id = 1416,
last_appl. = 1284,
protocols = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = 003b272f-01a9-11ef-beba-026f3aace78f
2024-04-23 19:48:04 0 [Note] WSREP: Flow-control interval: [23, 23]
2024-04-23 19:48:04 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1417)
2024-04-23 19:48:04 0 [Note] WSREP: Deinitializing config service v1
2024-04-23 19:48:04 2 [Note] WSREP: ####### processing CC 1417, local, ordered
2024-04-23 19:48:04 2 [Note] WSREP: Process first view: 003b272f-01a9-11ef-beba-026f3aace78f my uuid: 65c7b769-01aa-11ef-a4b9-4fcbab8908cb
2024-04-23 19:48:04 2 [Note] WSREP: Server mariadb-gnew-1 connected to cluster at position 003b272f-01a9-11ef-beba-026f3aace78f:1417 with ID 65c7b769-01aa-11ef-a4b9-4fcbab8908cb
2024-04-23 19:48:04 2 [Note] WSREP: Server status change disconnected -> connected
2024-04-23 19:48:04 2 [Note] WSREP: ####### My UUID: 65c7b769-01aa-11ef-a4b9-4fcbab8908cb
2024-04-23 19:48:04 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2024-04-23 19:48:04 0 [Note] WSREP: Service thread queue flushed.
2024-04-23 19:48:04 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2024-04-23 19:48:04 2 [Note] WSREP: State transfer required:
Group state: 003b272f-01a9-11ef-beba-026f3aace78f:1417
Local state: 00000000-0000-0000-0000-000000000000:-1
2024-04-23 19:48:04 2 [Note] WSREP: Server status change connected -> joiner
2024-04-23 19:48:04 0 [Note] WSREP: Joiner monitor thread started to monitor
2024-04-23 19:48:04 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '10.233.91.138' --datadir '/var/lib/mysql/' --parent 1 --progress 0'
WSREP_SST: [INFO] mariabackup SST started on joiner (20240423 19:48:04.639)
WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20240423 19:48:04.791)
WSREP_SST: [INFO] Progress reporting tool pv not found in path: /usr//bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/bin (20240423 19:48:05.150)
WSREP_SST: [INFO] Disabling all progress/rate-limiting (20240423 19:48:05.154)
WSREP_SST: [INFO] Streaming with mbstream (20240423 19:48:05.187)
WSREP_SST: [INFO] Using socat as streamer (20240423 19:48:05.191)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql/sst_in_progress (20240423 19:48:05.197)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:05.252)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:06.283)
2024-04-23 19:48:07 0 [Note] WSREP: (65c7b769-a4b9, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:07.315)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:08.350)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:09.381)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:10.423)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:11.458)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:12.500)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:13.536)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:14.574)
WSREP_SST: [ERROR] previous SST script still running. (20240423 19:48:14.584)
2024-04-23 19:48:14 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_mariabackup --role 'joiner' --address '10.233.91.138' --datadir '/var/lib/mysql/' --parent 1 --progress 0
Read: '(null)'
2024-04-23 19:48:14 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '10.233.91.138' --datadir '/var/lib/mysql/' --parent 1 --progress 0: 114 (Operation already in progress)
2024-04-23 19:48:14 2 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
2024-04-23 19:48:14 2 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.
2024-04-23 19:48:14 2 [Note] WSREP: ReplicatorSMM::abort()
2024-04-23 19:48:14 2 [Note] WSREP: Closing send monitor...
2024-04-23 19:48:14 2 [Note] WSREP: Closed send monitor.
2024-04-23 19:48:14 2 [Note] WSREP: gcomm: terminating thread
2024-04-23 19:48:14 2 [Note] WSREP: gcomm: joining thread
2024-04-23 19:48:14 2 [Note] WSREP: gcomm: closing backend
2024-04-23 19:48:15 2 [Note] WSREP: view(view_id(NON_PRIM,1f792853-896d,22) memb {
65c7b769-a4b9,0
} joined {
} left {
} partitioned {
1f792853-896d,0
})
2024-04-23 19:48:15 2 [Note] WSREP: PC protocol downgrade 1 -> 0
2024-04-23 19:48:15 2 [Note] WSREP: view((empty))
2024-04-23 19:48:15 2 [Note] WSREP: gcomm: closed
2024-04-23 19:48:15 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2024-04-23 19:48:15 0 [Note] WSREP: Flow-control interval: [16, 16]
2024-04-23 19:48:15 0 [Note] WSREP: Received NON-PRIMARY.
2024-04-23 19:48:15 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 1417)
2024-04-23 19:48:15 0 [Note] WSREP: New SELF-LEAVE.
2024-04-23 19:48:15 0 [Note] WSREP: Flow-control interval: [0, 0]
2024-04-23 19:48:15 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2024-04-23 19:48:15 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1417)
2024-04-23 19:48:15 0 [Note] WSREP: RECV thread exiting 0: Success
2024-04-23 19:48:15 2 [Note] WSREP: recv_thread() joined.
2024-04-23 19:48:15 2 [Note] WSREP: Closing replication queue.
2024-04-23 19:48:15 2 [Note] WSREP: Closing slave action queue.
2024-04-23 19:48:15 2 [Note] WSREP: mariadbd: Terminated.
240423 19:48:15 [ERROR] mysqld got signal 11 ;
Sorry, we probably made a mistake, and this is a bug.
Your assistance in bug reporting will enable us to fix this for the next release.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 11.3.2-MariaDB-1:11.3.2+maria~ubu2204 source revision: 068a6819eb63bcb01fdfa037c9bf3bf63c33ee42
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 336992 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f6f5c000c68
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6f81425c68 thread_stack 0x49000
Printing to addr2line failed
mariadbd(my_print_stacktrace+0x32)[0x55a1a86358a2]
mariadbd(handle_fatal_signal+0x478)[0x55a1a8106488]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6f83bef520]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x178)[0x7f6f83bd5898]
/usr/lib/galera/libgalera_smm.so(+0x157602)[0x7f6f83671602]
/usr/lib/galera/libgalera_smm.so(+0x700e1)[0x7f6f8358a0e1]
/usr/lib/galera/libgalera_smm.so(+0x6cc94)[0x7f6f83586c94]
/usr/lib/galera/libgalera_smm.so(+0x8b311)[0x7f6f835a5311]
/usr/lib/galera/libgalera_smm.so(+0x604a0)[0x7f6f8357a4a0]
/usr/lib/galera/libgalera_smm.so(+0x48261)[0x7f6f83562261]
mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x55a1a86f5592]
mariadbd(+0xd93e31)[0x55a1a83c5e31]
mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x55a1a83b3a7b]
mariadbd(+0xd05f86)[0x55a1a8337f86]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f6f83c41ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f6f83cd3850]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 2
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on,cset_narrowing=off,sargable_casefold=on
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
information that should help you find out what is causing the crash.
We think the query pointer is invalid, but we will try to print it anyway.
Query:
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 0 bytes
Max resident set unlimited unlimited bytes
Max processes unlimited unlimited processes
Max open files 65535 65535 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 192953 192953 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: core
Kernel version: Linux version 5.10.0-28-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.209-2 (2024-01-31)
2024-04-23T12:48:15.638369254-07:00
I'm not sure if I'm doing something insanely wrong, or if my MariaDB cluster is just broke lol.
Here's my deployment YAML again:
Deployment YAML
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-fixed
namespace: mariadb
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
argocd.argoproj.io/sync-options: Prune=false
spec:
rootPasswordSecretKeyRef:
name: mariadb-creds
key: root-password
podSecurityContext:
runAsUser: 0
storage:
size: 30Gi
storageClassName: local-path
resizeInUseVolumes: true
waitForVolumeResize: true
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
storageClassName: local-path
image: mariadb:11.3.2
replicas: 3
galera:
enabled: true
primary:
automaticFailover: true
replicaThreads: 1
agent:
image: ghcr.io/mariadb-operator/mariadb-operator:v0.0.27
port: 5555
kubernetesAuth:
enabled: true
gracefulShutdownTimeout: 1s
recovery:
enabled: true
minClusterSize: 40%
clusterHealthyTimeout: 30s
clusterBootstrapTimeout: 10m0s
podRecoveryTimeout: 3m0s
podSyncTimeout: 3m0s
initContainer:
image: ghcr.io/mariadb-operator/mariadb-operator:v0.0.27
initJob:
labels:
sidecar.istio.io/inject: "false"
config:
reuseStorageVolume: false
volumeClaimTemplate:
resources:
requests:
storage: 300Mi
accessModes:
- ReadWriteOnce
service:
type: LoadBalancer
annotations:
metallb.universe.tf/ip-allocated-from-pool: first-pool
metallb.universe.tf/loadBalancerIPs: 10.11.0.30
connection:
secretName: mariadb-fixed-conn
secretTemplate:
key: dsn
primaryService:
type: LoadBalancer
annotations:
metallb.universe.tf/ip-allocated-from-pool: first-pool
metallb.universe.tf/loadBalancerIPs: 10.11.0.29
primaryConnection:
secretName: mariadb-fixed-conn-primary
secretTemplate:
key: dsn
secondaryService:
type: LoadBalancer
annotations:
metallb.universe.tf/ip-allocated-from-pool: first-pool
metallb.universe.tf/loadBalancerIPs: 10.11.0.28
secondaryConnection:
secretName: mariadb-fixed-conn-secondary
secretTemplate:
key: dsn
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
tolerations:
- key: "mariadb.mmontes.io/ha"
operator: "Exists"
effect: "NoSchedule"
updateStrategy:
type: RollingUpdate
myCnf: |
[mariadb]
bind-address=*
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
max_allowed_packet=256M
resources:
requests:
cpu: 300m
memory: 256Mi
limits:
memory: 1Gi
metrics:
enabled: true
---
apiVersion: k8s.mariadb.com/v1alpha1
kind: Backup
metadata:
name: mariadb-fixed-backup-scheduled
namespace: mariadb
spec:
mariaDbRef:
name: mariadb-fixed
schedule:
cron: "0 */12 * * *" # every 12 hours
suspend: false
maxRetention: 1440h # 60 days
storage:
s3:
bucket: mysql-backups
endpoint: minio.minio.svc.newcluster.local:9000
region: us-east-1
accessKeyIdSecretKeyRef:
name: minio-creds
key: MINIO_ACCESS_KEY
secretAccessKeySecretKeyRef:
name: minio-creds
key: MINIO_SECRET_KEY
tls:
enabled: false
args:
- --single-transaction
- --all-databases
logLevel: info
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 300m
memory: 512Mi
Hey there @perfectra1n !
I've just tested this locally and managed to upscale a Galera cluster with 1GB of data generated with sysbench.
[INFO] previous SST is not completed, waiting for it to exit (20240423 19:48:14.574)
The issue is that your node has a pending SST, and it will keep restarting until it succeeds. The operator still doesn't manage this situation, but it is in our radar:
You can cancel the SST by:
- Exec into the
Pod
and delete/var/lib/mysql/wsrep_sst.pid
- Restart the
Pod
If that works, let's close this issue and track everything related to SST recovery in #425
Thanks!
Gotchya, well I'll be sure to give those a shot when I use Galera next. Currently I just scaled down my DB to 1 node so that it would at least work for the time being.
This issue is stale because it has been open 30 days with no activity.
This issue was closed because it has been stalled for 10 days with no activity.