Consistent Hashing

In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.

Basic technique

Consistent hashing was designed to avoid the problem of having to reassign every BLOB when a server is added or removed throughout the cluster. The central idea is, we use a hash function that randomly maps both the BLOB and servers to a unit circle, usually 2radians. For example, {= % 360} (where is hash of a BLOB or server’s identifier, like IP address or UUID). Each BLOB is then assigned to the next server that appears on the circle in clockwise order. Usually, binary search algorithm or linear search is used to find a spot” or server to place that particular BLOB in O(N) or O(N) complexities respectively; and in every iteration, which happens in clockwise manner, an operation {  } (where is the value of the server within the cluster) is performed to find the server to place the BLOB. This provides an even distribution of BLOBs to servers. But, more importantly, if a server fails and is removed from the circle, only the BLOBs that were mapped to the failed server need to be reassigned to the next server in clockwise order. Likewise, if a new server is added, it is added to the unit circle, and only the BLOBs mapped to that server need to be reassigned.


uid: 202208200937 tags: #software-engineering #algorithms


Date
February 22, 2023