-
Notifications
You must be signed in to change notification settings - Fork 448
Open
Description
I have a three-node MariaDB 10.11 Galera cluster running in Docker (host networking) across three sites connected by VPN:
- DC1 (10.82.1.11)
- DC2 (192.168.138.151)
- DC3 (192.168.128.149)
When bringing up nodes one at a time without specifying a donor on DC2, the second join always fails its IST (and never falls back to SST). I’m forced to set on DC2:
wsrep_sst_donor="dc1,dc3"
With that, both DC2→DC1 and DC2→DC3 joins now succeed. However, if DC1 is temporarily down, DC2→DC3 still fails.
Reproduction steps
- Start DC1 alone → PRIMARY/SYNCED
- Start DC2 without wsrep_sst_donor → joins from DC1 via SST → OK
- Start DC3 without wsrep_sst_donor → IST from DC2 (or DC1) → timeout → abort
- Add on DC2: wsrep_sst_donor="dc1,dc3"
- Restart DC3 → joins from DC1 (preferred) or DC3 fallback → OK
- Stop DC1, restart DC3 → attempts from DC1 first, fails, no fallback to DC2 → abort
Sample log when dc3 joins after dc2 (no donor set):
2025-07-28 15:43:05 [Note] WSREP: IST uuid:… f: 34, l: 35
2025-07-28 15:43:05 [Note] WSREP: requested state transfer from 'dc2,dc1'. Selected dc2 (SYNCED) as donor
2025-07-28 15:45:20 [Warning] WSREP: State transfer to dc3 failed: Operation timed out
2025-07-28 15:45:20 [ERROR] WSREP: Will never receive state. Need to abort.
My configuration (identical on each node):
[mysqld]
wsrep_on=ON
wsrep_cluster_address="gcomm://10.82.1.11,192.168.138.151,192.168.128.149"
wsrep_node_name=<dcX>
wsrep_node_address=<IP_VPN_dcX>
wsrep_sst_method=rsync
wsrep_sst_donor="dcY,dcZ"
gcache.size=1G
I have confirmed:
- All ports (4567, 4568, 4444) are reachable pairwise over the VPN.
- No host firewalls or Docker network restrictions.
- rsync is installed, wsrep_sst_auth is functional if used.
- Systemd TimeoutStartSec increased to 15 min.
- Enabling wsrep_debug=ON shows IST attempts but no SST fallback logs.
I’d be very grateful for any insight into why IST never falls back to SST in this scenario, and why forcing a donor still fails when the first donor is unavailable. Thank you!
Metadata
Metadata
Assignees
Labels
No labels