/HashBin

A small helper to bin your things.

Primary LanguagePHPMIT LicenseMIT

HashBin

Build Status Code Climate Code Climate

This library takes a string value and generates integer hash codes from it. Useful if you want to calculate a "random" integer from some value that remains consistent across calls. PHP has access to a wealth of hashing functions, but their outputs are nearly always strings. This wraps those hash functions and some additional logic into a simple object-oriented interface.

Installation

composer require mcordingley/hash-bin

Quick Start

use mcordingley\HashBin\BinStrategies\Modulo;
use mcordingley\HashBin\HashBin;
use mcordingley\HashBin\Hashers\IntegerInterpreters\Unpack;
use mcordingley\HashBin\Hashers\Sip24;

// Use library defaults.
$hashBin = HashBin::make();

// Or use other options.
$hashBin = new HashBin(new Sip24($key, new Unpack), new Modulo);

// From 0 to 15, inclusive.
$first = $hashBin->bin('test', 15);

// From 5 to 15, inclusive.
$second = $hashBin->binRange('test', 5, 15);

API

Usage starts with the HashBin class. You can create a new instance either directly with the new operator or with the make static factory method. make will give you a configuration chosen to work for the majority of use cases and should be your go-to unless there is reason otherwise. The defaults may change in a future release and will not be considered a breaking change, so if you depend on having the bin outputs not change between releases consider explicitly setting your configuration through the constructor method.

In addition to the bin and binRange methods described above, HashBin comes with setBase(string $base) to set a common base value to mix into later values. This is done so that different HashBin instances can provide different sets of outputs.

The HashBin constructor takes two arguments, a Hasher and a BinStrategy. The Hasher is responsible for converting supplied strings into integer values. BinStrategy then takes that integer and constrains it to the specified range.

Integer Interpreters

Integer interpreters are used by some hashers to convert binary strings into integers. Unpack is the only supplied interpreter and takes no arguments in its constructor.

Hashers

CRC32 is the default hasher implementation and uses the CRC32 hash function to directly convert strings to integers. It is extremely fast, but provides no guarantees against collisions if given user-generated values. Its constructor takes no arguments. This hasher should be your default choice.

If you require some cryptographic hardening against potential collisions, use Sip24. This hasher requires libsodium to be present, either via PHP 7.2 or newer or through having the sodium_compat library installed. It uses SipHash-2-4 to calculate its hashes. SipHash is a keyed hash function that works well with short inputs. Its constructor takes a secret key SODIUM_CRYPTO_SHORTHASH_KEYBYTES long and an integer interpreter. The key should be a raw binary string that persists between uses of this library. Ideally, the key should be generated with the random_bytes method: random_bytes(SODIUM_CRYPTO_SHORTHASH_KEYBYTES). Store the key encoded in hexadecimal (bin2hex) or base64 (base64_encode) and be sure to decode it before use.

Native wraps the hash function in PHP and enables the use of any hash algorithm supported by that method. Since this returns a string instead of an integer, it must also be supplied with an IntegerInterpreter to translate that binary string into an integer, so the constructor signature is thus: new Native('algo', new Unpack). Note that only enough bits are used from the output to construct an integer, so even if the underlying hash function is collision resistant, the bins that are output are not.

Bin Strategies

Two bin strategies are provided with the library: Multiply and Modulo.

Multiply is the recommended implementation and is used in HashBin::make(). Its constructor takes a single, optional argument. The default value is the recommended one and should be used unless there is specific reason otherwise. This implementation provides good all-around characteristics.

Modulo uses the modulo arithmetic operator to constrain the integer range. It has the downside that it will slightly overweight lower values in its range against higher values, but also provides somewhat more predictable output values given its input.