You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How to recover from partitioning after 'pause_if_all_down' is configurable
Now that 'pause_if_all_down' accepts a list of preferred nodes, it is
possible that these nodes are spread across multiple partitions. For
example, suppose we have nodes A and B in datacenter #1 and nodes C and
D in datacenter #2, and we set {pause_if_all_down, [A, C]}, If the link
between both datacenters is lost, A/B and C/D forms two partitions.
RabbitMQ continues to run at both sites because all nodes see at least
one node from the preferred nodes list. When the link comes back, we
need to handle the recovery.
Therefore, a user can specify the strategy:
o {pause_if_all_down, [...], ignore} (default)
o {pause_if_all_down, [...], autoheal}
This third parameter is mandatory.
If the strategy is 'ignore', RabbitMQ is started again on paused nodes,
as soon as they see another node from the preferred nodes list. This is
the default behaviour.
If the strategy is 'autoheal', RabbitMQ is started again, like in
'ignore' mode, but when all nodes are up, autohealing kicks in as well.
Compared to plain 'autoheal' mode, the chance of loosing data is low
because paused nodes never drifted away from the cluster. When they
start again, they join the cluster and resume operations as any starting
node.
0 commit comments