Consistent Hashing
Problems before consistent hashing
Assume we have a cluster of 2 servers: S0 and S1
We use a simple hash function that is based on the number of servers we have, example:
value % (number of servers)
for an incoming value of 3, 3%2 = 1
, so we hit server 1, as the hash functions tell us where the data is present.
If we increase the number of servers to 3, the hash function changes and then the same value 3, is now 3%3=0
, so we hit server 0. But the data is actually in server 1 unless we re-shuffle the data everytime we add and remove servers which is very expensive and complicated.
To avoid, this issue, the concept of consistent hashing is used where we approximate a server and not basing the hash function on the number of servers. Consistent hashing works regardless of servers added or removed from the cluster.