Jump consistent hashing index

Contents hide

Background

Jump consistent hash algorithm is a consistent hash algorithm that has been discussed in the previous blog Jump Consistent Hash Algorithm. After adding some new hosts in a distributed storage system, at some point we have to rebalance data across all the hosts.

Bottlenecks

A typical method to rebalance each table’s data is to scan each tuple in that table, and filter out those that need to transfer to new hosts. This method has to scan each tuple in the table and do a full hash computing for each tuple, by full hash computing, we mean two parts:

compute the tuple’s hash value based on dist-columns’ type and value
using consistent hash to map the has value to target host

Profiler shows that considerable time has been cost for the hash computing. So a practical problem comes to me:

How to quickly find out the tuples that need transfer?

For each tuple, we could compute its hash value. For this hash value and current hosts number, we could compute the next hosts number that causes the tuple transfer.

We can understand this by the process of jump consistent hash. We use initialize a pseudo random number generator using this hash value thus we get a deterministic infinite long series of number: $R_1, R_2, R_3\dots$ , for each $R_i$ we can attach it with a flag or not by an event. The largest flag index is the target host id.

Above is a re-statement of jump consistent hashing. For each tuple and hosts number N, the next flag index is defined as: $Next(tuple) = min({i|i>N, R_i\ has\ flag})$ .

Suppose we create a B-tree index for this “new column”. When we decide to expand the cluster from N to M hosts, to get the tuples that need transfer, the process is as simple as the following SQL:

select * from table where Next <= M;

And this process can take advantage of index-scan. I think it will be much faster than the typical scan-and-filter method.

Thoughts on Computing

英雄无岁，江湖无辈

Jump consistent hashing index

Background

Bottlenecks