Pivotal-hash: a new consistent hash algorithm

Contents hide

1 Previous Posts

2 Maglev Hahsing Paper

3 Pivotal-Hash algorithm

4 Some notes

4.1 Performance

4.2 How to chooces M?

My previous posts on consistent hash are:

The above algorithm is beautiful while its time complexity is $O(n)$ will introduce some CPU overhead in some extreme conditions compared to simple modulo hashing.

Maglev Hahsing Paper

One of my colleagues searches Google to find another consistent algorithm called maglev (中文这个单词是”磁悬浮”的意思). The original paper is here: Maglev: A Fast and Reliable Software Network Load Balancer. The hash algorithm is just a small piece section of the original paper and lack of mathematical analysis. And there is a golang implementation on Github.

I don’t want to analyze the algorithm here. The algorithm is not beautiful. The key idea is to compute the target segment by looking up in a pre-computed array (called lookup table). The paper suggests that the size of lookup table need to be much larger than the number of the segments (more than 100 times). The computing progress needs to maintain a huge permutation table (size of lookup table times the number of segments). The paper also introduces a permutation table generating algorithm which is complicated (at least for the fist-eye). It mentions that Knuth-reshuffle will do the work well also.

The biggest flaw of the paper in my mind is that the hash algorithm is not monotonous. See the example of the paper:

Lookup tables of different nodes in the original paper

After removing the nodes B1, look at the last hash mapping, B0 changes to B2. This means that if this algorithm is used for some storage system, some data movement will happen among untouched nodes, which in my mind is bad.

Pivotal-Hash algorithm

Based on the idea of lookup table, a beautiful mathematic formula comes to my mind.

$n = \lceil \frac{n}{m} \rceil + \lceil \frac{n-1}{m} \rceil + \dots + \lceil \frac{n-m+1}{m} \rceil = \sum_{k=0}^{k=m-1} \lceil \frac{n-k}{m} \rceil$

This formula is from the famous book 《Concrete mathematics : a foundation for computer science》. It solves the problem:

This is just the requirement of balance for consistent hashing. So using the above formula we can compute a best configuration given the number of segments and the size of the lookup table. Suppose at first we have n segments, and we decide to add one more segment.

Configuration for n: $\{s_1, s_2, \dots, s_n\}$
Configuration for n+1: $\{t_1, t_2, \dots, t_n, t_{n+1}\}$

It is easy to verify that $s_i \ge t_i$ . We can explain $t_{n+1}$ like this:

$t_{n+1} = (s_1-t_1) + (s_2 - t_2) + \dots + (s_n-t_n)$

The above formula tells us that we could find a consistent algorithm that is monotonous.

Below is my python code to implement this algorithm:

#!/usr/bin/env python3
#-*- coding: utf-8 -*-

from copy import deepcopy
from math import ceil
from collections import defaultdict


class Phash:

    def __init__(self, Nsegs, M):
        self.Nsegs = Nsegs
        self.M = M

    def build_lookup(self):
        lookup = [0] * self.M
        for i in range(1, self.Nsegs):
            lookup = self._build_bookup(i, lookup)
        return lookup

    def _build_bookup(self, i, lookup):
        """
        from i-1 enlarge to i
        """
        prev_config = self.compute_config(i-1)
        curr_config = self.compute_config(i)
        delta = [(n1-n2)
                 for n1, n2 in zip(prev_config, curr_config)]

        cp_lookup = deepcopy(lookup)
        for index, seg in enumerate(lookup):
            d = delta[seg]
            if d == 0:
                continue
            cp_lookup[index] = i
            delta[seg] -= 1

        self.is_good(lookup, cp_lookup)

        return cp_lookup

    def compute_config(self, i):
        ii = i + 1
        return [int(ceil(float(self.M - x) / ii))
                for x in range(ii)]

    def report(self, lookup):
        print(lookup)
        d = defaultdict(int)
        for i in lookup:
            d[i] += 1

        for k, v in d.items():
            print(k, v)

    def is_good(self, lk1, lk2):
        new_seg = max(lk2)
        for a, b in zip(lk1, lk2):
            if b != new_seg:
                assert(a == b)

if __name__ == "__main__":
    p = Phash(5, 17)
    p.report(p.build_lookup())

Some notes

Performance

Jump consistent algorithm contains almost pure-computing instructions. However, lookup table method has to access memory. If the size of lookup table is large (in fact, it should be large), the table cannot be put into cache. Accessing memory is much slower than accessing registers.

How to chooces M?

At the first eye, M should be larger than the number of segments. But how larger? Let’s do some mathematical reasoning.

For the best balance algorithm, pivotal-hash, if we denote the number of slots in lookup table occupied by segment i as $nslot(i)$ then $max(nslot(i) - nslot(j)) = 1$ .

So the skew-ratio can be modeld as $\frac{1}{\frac{M}{N}} = \frac{N}{M}$ . We surely want the skew-ratio to be as smaller as possible. If the tolerate-boundary is $a$ , then $\frac{N}{M} < a \implies M > \frac{N}{a}$ .

For tolerate-boundary is $a=1\%$ , for the best algorithm, we have to set $M > 100N$ .

Thoughts on Computing

英雄无岁，江湖无辈

Pivotal-hash: a new consistent hash algorithm

Previous Posts

Maglev Hahsing Paper

Pivotal-Hash algorithm

Some notes

Performance

How to chooces M?

Leave a ReplyCancel reply

Previous Posts

Maglev Hahsing Paper

Pivotal-Hash algorithm

Some notes

Performance

How to chooces M?

Leave a ReplyCancel reply

Discover more from Thoughts on Computing