SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol

Unknowns

virtually synchronous multicast protocols(drastic reduction in performance, and partitioning, at beyond a few dozen members)
stable rate of false positives: Having a stable rate of false positives is better than having a stable rate of false negatives
infection style
Contact member ?
Static coordinator ?
why toss coin ?
anti entropy
- merkle trees ?

high network loads that grow quadratically with group size
Compromise network times
False positive frequency (detects a system has crashed if even if it is alive due to network issues)
time to detect failures is high

2 protocols for failure detection and message dissemination for membership updates
Time for membership changes is not affected by group size
2 step process for detecting failure by suspecting processes first
provides stable failure detection times
low message load for group members
efficient non multicast failure detection
using the dissemination component only when a membership change occurs. How is this different from other protocols ?
dissemination latency in the group grows slowly (logarithmically) with the number of members.

M ping - ack P
- If no ack from P, process M picks k random processes to ping the suspected process P. If none of them receive a ping from the suspected process P, it will be declared as failed in M’s local membership list and then passed on to the dissemination process.

Piggybacking

Piggybacking in a way, is to use the ping-ack message to send data as well. The ack message is immediately not sent by the receiver but only sent when it has additional data to send. This way, only PING-ACK messages with data are passed around.
Incarnation

SWIM assumes the worst always. Example: If node B is declared dead, an alive protocol from B will not be considered as SWIM assumes the worst, i.e, dead. Incarnation number is added, which is like a counter. A node can only increment its incarnation number. B will increment its incarnation number from 1→2 and send an alive message. Other nodes will always consider a message with higher incarnation than the one they have stored in their membership list, i.e, 1.
Generation + incarnation to tackle node restart