Background
Jump consistent hash algorithm is a consistent hash algorithm that has been discussed in the previous blog Jump Consistent Hash Algorithm. After adding some new hosts in a distributed storage system, at some
Bottlenecks
A typical method to rebalance each table’s data is to scan each tuple in that
- compute the tuple’s hash value based on dist-columns’ type and value
- using consistent hash to map the has value to target host
Profiler shows that considerable time has been cost for the hash computing. So a practical problem comes to me:
How to quickly find out the tuples that need transfer?
Jump consistent hashing index
For each tuple, we could compute its hash value. For this hash value and current hosts number, we could compute the next hosts number that causes the tuple transfer.
We can understand this by the process of jump consistent hash. We use initialize a pseudo random number generator using this hash value thus we get a deterministic infinite long series of number: R_1, R_2, R_3\dots, for each R_i we can attach it with a flag or not by an event. The largest flag index is the target host id.
Above is a re-statement of jump consistent hashing. For each tuple and hosts number N, the next flag index is defined as: Next(tuple) = min({i|i>N, R_i\ has\ flag}).
Suppose we create a B-tree index for this “new column”. When we decide to expand the cluster from N to M hosts, to get the tuples that need transfer, the process is as simple as the following SQL:
select * from table where Next <= M;
And this process can take advantage of index-scan. I think it will be much faster than the typical scan-and-filter method.
Pingback: Pivotal-hash: a new consistent hash algorithm | Thoughts on Computing