lemire/fastrange

Have you considered reversing the bit-sequence of the input value 'word'?

Closed this issue · 6 comments

Have you considered reversing the bit-sequence of the input value 'word'?
In Java there is a nice function Integer.reverse(int):int which reverses the bit-sequence of an int.
This gives a better distribution for small integer values, which are more common.

public class FastRange {
private static int reverseFastRange(int word, int p) {
return (int) (((0xffffffffL & Integer.reverse(word)) * p) >>> 32);
}
}

The problem we are solving is to take a machine word and map it to an integer value in a range [0,p) as fairly as possible. Obviously, you can apply to the machine word any bijective map you'd like. Since speed is the concern, you should use a fast bijection...

Of course, there might be different ways to implement a bit reversal function, but here is one such function:

public static int reverse(int val)  {
      val = ((val >> 1) & 0x55555555) + ((val << 1) & ~0x55555555);
      val = ((val >> 2) & 0x33333333) + ((val << 2) & ~0x33333333);
      val = ((val >> 4) & 0x0f0f0f0f) + ((val << 4) & ~0x0f0f0f0f);
      val = ((val >> 8) & 0x00ff00ff) + ((val << 8) & ~0x00ff00ff);
      return ((val >> 16) & 0x0000ffff) + ((val << 16) & ~0x0000ffff);
}

That function alone is likely to compile down to 20 instructions or so. So the bit reversal function is likely much more expensive than the multiply and shift routine.

I would think that if you'd like to have a fast bit-mixing bijection, bit reversal is hardly desirable in this respect. It is likely slow, and it is unclear whether it has desirable pseudo-random properties. In any case, it is somewhat of an orthogonal issue: you can, of course, apply any bijection you'd like... but I would not be tempted to use bit reversal.

So, in terms of design, I can split the code up into a mapping of the input values and a call to fastRange.

I am looking for a mapping that distributes the output value in a similar way that a modulo does for a given 'p'.
So, a faster mapping of the input values might be a rotation of the integer 'word'.

/** Mapping from a number 'word' into range [0,'p'), with a similar distribution like the modulo operator for a given 'p'. */
private static int rotatedFastRange(int word, int p) {
    return fastRange(Integer.rotateLeft(word,Integer.numberOfLeadingZeros(p)),p);
}
/** Lemire's fast range mapping from a number 'word' into range [0,'p'). */
private static int fastRange(int word, int p) {
    return (int) (((0xffffffffL & word) * p) >>> 32);
}

Thank you very much!

I am looking for a mapping that distributes the output value in a similar way that a modulo does for a given 'p'.

Can you define what you are after? Can you define the problem you are working on...

If you want random numbers in [0,p), then you should use a fast RNG like PCG and use their interval-value generator. See this reference
https://arxiv.org/abs/1805.10941

If you want to actually compute x%p quickly for a fixed p, then you can do it with two multiplications and a shift, see https://arxiv.org/abs/1902.01961

I do not understand what problem your proposal is meant to solve.

Can you express mathematically this concept: "with a similar distribution like the modulo operator"?

I am implementing currently the following things:

    1. immutable Set classes in Java that have a proper read-only API.
    1. insertion-only Sets for graph searches.

I am looking for an avalanche function that maps a 32-bit hash value into the range [0,p).
("avalanche": so that even small changes in the 32-bit hash yield different result values.)

32-bit hash values in Java often have high variability in the lower range of the value. So, a modulo is often better than the fair range function. That's why the implementations of Sets in Java use modulo or - if p is a power of 2 - just use a bit-mask to extract the lower bits.

The function, that I have proposed in my previous message, boils down to rotate, multiply, shift. (Actually shift, multiply, shift will do).

Like modulo, it is unfortunately not an avalanche function - so it sucks - but it does need less resources to compute.

I think that the concept of avalanche is defined for power-of-two outputs: flipping one bit of the input has a 50% probability of flipping any bit in the output. It is not applicable to the range selection.

You can achieve a very good avalanche effect using the Murmur3 finalizer (this code is in C++, but the equivalent in Java is easy):

uint32_t murmur32(uint32_t h, const murmur32_t * key) {
  h ^= h >> 16;
  h *= 0x85ebca6b;
  h ^= h >> 13;
  h *= 0xc2b2ae35;
  h ^= h >> 16;
  return h;
}

And then you can use the fast range technique.

I recommend against creating your own bit mixing functions. I recommend using well studied, well tested techniques.

Oh, this is very helpful! Thank you very much, I am going to use this approach.