Consistent Hashing
Distribute Data/Connections across all servers evenly using a good hash function (like MD5 or Murmur hash)
Simple hashing, when the number of servers are fixed.
$ serverIndex = hash(key) % N $ N = # of servers
If any server goes down, then N changes and the impact is drastic as most of the keys will have to be redistributed
Consistent Hashing
We map both servers and objects onto the hash ring, using a uniformly distributed hash function
In addition to hashing the object keys, we also hash the server names and come with range of values called hash space
The concept of Ring
Place the servers onto the ring by hashing its Id.
then we hash each object by its keys (sessionId, transactionId etc.) using the same hashing function, and use the hash directly to map the key onto the ring
To locate the server for a particular object, we go clockwise from the location of the object key on the ring until a server is found.
if a new server is added into the ring(with its new hash based on its id), only the keys left to it will get affected
- with simple hashing, when a new key is added, almost all the keys need to be remapped
- with consistent hashing, adding a new server only requires redistributing of a fraction of the keys
Potential Issue
With hash functions, achieving a perfect distribution & equally sized segments on the rings is very unlikely. Conceptually, random points are picked on the ring
Situation might occur that a lot of objects map to a single server unevenly, leaving other servers free. the problem is exacerbated if the servers are frequently added or removed.
This problem is resolved with the usage of the virtual nodes
We can have x
servers with y
virtual nodes for each. More virtual server
implies more distribution over the ring
But, maintaining the metadata for the virtual nodes take up more space, so a trade-off to tune the number of virtual nodes to fit our system requirements
To be used to distribute Data evenly
Cassandra and Dynamo DB uses consistent hashing for Data Partitioning
Data is distributed across multiple servers (horizontal scaling) in a distributed DB environment.
for predictable performance, Evenly distributed data accross all servers is what is aimed
To bve used for load balancing
Taking the requests and evenly balance the load among N servers
Distribute persistent connection evenly. this limits the number of connections that needs to be restablished when a backend server goes down
Hash the serverId along with the requestId
Amazon DynamoDB | Data Partitioning. Helps DB minimize data movement during rebalancing
|
Apache Cassandra | |
Content Delivery Networks | Distributes Web Content Evenly among the edge servers
|
like Acamai | |
Load Balancers | Distribute persistent connection evenly across backend servers
|
like Google Load Balancers |
Guaranteed Strong Consistency
$$ R + W > N $$ 𝑅 : is the number of replicas that agreed on read, 𝑊 : is the number of replicas that successfully take a write, and 𝑁 : is the total number of replicas,