A Lemma on Number Theory

In today’s techtalk in Pivotal Inc. one of my colleagues asked a question about the benefits of double the batch size of one step in a hashjoin algorithm. I think it is not obvious enough to understand it at first glimpse. So I offer the formal description and proof here to make sure that myself understand it well.

The problem in informal language

We have a hash function h(x) = x, and we have a hashtable whose bucket number is N, and we decide to divide each number into different batches. Let B be the number of all the batches, then for number x, its corresponding batch ID is batch_{id}(x) = (h(x) / N) \b mod B.

At some point, we find that the number of numbers in the batch is larger than N, and this means that we need to enlarge the total batch numbers.  In the algorithm, we set B to 2B.  And the benefit is that recalculating the batch id of this batch, the new batch ids are are larger than current batch id. In other words, redistribution can only move current batch elements to the latter batches.

The lemma in formal language

The problem above can be represented in formal language as follows:

\forall a, b \in N, b > 0, a \bmod 2b \ge a \bmod b.

Let’s prove this lemma:

a \bmod 2b \ge a \bmod b  \iff a - 2b \lfloor \frac{a}{2b} \rfloor \ge a - b \lfloor \frac{a}{b} \rfloor  \iff \lfloor \frac{a}{b} \rfloor \ge 2\lfloor \frac{a}{2b} \rfloor

let m = \lfloor \frac{a}{2b} \rfloor, so m \in N,  according to lemma in concrete mathematics Page 69’s (3.5), we have:

m \le \frac{a}{2b} < m + 1 \implies 2m \le \frac{a}{b} < 2m + 2\\  2m \le \frac{a}{b}, 2m \in N

Now,  according to concrete mathematics Page 69’s (3.7), we have

2m \le \frac{a}{b}, 2m \in N \iff 2m \le \lfloor\frac{a}{b} \rfloor \iff 2\lfloor \frac{a}{2b} \rfloor \le \lfloor\frac{a}{b} \rfloor.

Proof end.

Other enlarging strategies?

If we just increase the batch size by m, 0 \le m < B, would we still enjoy the good property a \bmod (B + m) \ge a \bmod B?

The answer is no. To disprove it, all we need to do is to offer a bad case: a = B + m. Then a \bmod B = m > a \bmod (B+m) = 0.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.