#SWIM
This is an attempt to implement SWIM in GO.
I recommend you to read the paper in References sector first, i don't think i can explain better than the paper itself.
Here is the TL;DR version:
In distributed computing, a failure detector is an application or a subsystem that is responsible for detection of node failures or crashes in a distributed system.
A Traditional way to implement it is using heartbeat protocol: which either impose network loads that grow quadratically with group size, or compromise response times or false positive frequency w.r.t. detecting process crashes
The new system, called SWIM, provides a membership substrate that:
- imposes a constant message load per group member;
- detects a process failure in an (expected) constant time at some non-faulty process in the group;
- provides a deterministic bound (as a function of group size) on the local time that a non-faulty process takes to detect failure of another process;
- propagates membership updates, including information about failures, in infection-style (also gossip-style or epidemic-style); the dissemination latency in the group grows slowly (logarithmically) with the number of members;
- provides a mechanism to reduce the rate of false positives by “suspecting” a process before “declaring” it as failed within the group
SWIM has two components: (1) a Failure Detector Component, that detects failures of members, and (2) a Dissemination Component, that disseminates information about members that have recently either joined or left the group, or failed.
Given we have a cluster of n nodes. Each node has information about m nodes in the cluster (m <= n).
Every T' time unit (which called a protocol period), at node Mi
-
Increase the period
pr -
Select a random node in m nodes, called it Mj and ping it
ping(Mi, Mj , pr). Wait for the worst-case message round-trip for anack(Mi, Mj , pr).- If the ack message come back, that means Mj is still alive.
- If not, move to step 3
-
Select
knodes in m nodes, ask them to pingMjping-req(Mi, Mj , pr)- If one of them receive the ack message
ack(Mi, Mj , pr), the node is still alive - If no one receive the ack until the end of the period, declared Mj as failed.
- If one of them receive the ack message
-
To reduce the false positive in step 3, instead of mark
Mjas failed immediately, we will mark it assuspected. After a prespecified time-out, it will be declared asfailed. But if it response within the timeout, it will be declared asaliveagain.
As any given time, at node Mi
On receipt of ping-req(Mm, Mj , pr) message (Mj != Mi), send a ping(Mi, Mj , Mm, pr) message to Mj
On receipt of ack(Mi, Mj , Mm, pr) message from Mj, send an ack(Mm, Mj , pr) message to received to Mm
On receipt of ping(Mm, Mi, Ml, pr) message from Mm, reply with an ack(Mm, Mi, Ml, pr) message to Mm
On receipt of ping(Mm, Mi, pr) message from Mm, reply with an ack(Mm, Mi, pr) message to Mm