Fast Murmur3 hash implementation for Java, mostly a fork of yonik work with String-optimized 128-bit implementation and Guava-style sink added. See blog post about the original verion.
There are 3 128-bit versions provided:
murmurhash3_x64_128
which takes a byte buffer to hashmurmurhash3_x64_128
which takes aCharSequence
and treats it as a UTF-8 encoded string. It doesn't make a quite expensive call toString.getBytes
and decodes small chunks on the go instead. It runs in constant memory and uses a 19 element byte buffer which can be provided by the caller to avoid any allocations. See memory charts below for a difference in memory consumptionmurmurhash3_x64_128_ascii
which takes aCharSequence
and treats it as UTF-8 encoded string with ASCII characters only. Has zero allocations and provides a significant performance boost.
There are no checks made around string decoding operations, so wrong hashes will be generated if a non-ASCII characters
are passed to murmurhash3_x64_128_ascii
or if murmurhash3_x64_128
receives an illegal UTF-8 string, no errors will be
raised.
Here are results for 128-bit hash on my laptop, run ./gradlew jmh
to reproduce. One point is given for
hashing 100 random strings, so a score of 38820
for example means 3.8M hashes/sec.
guava
usescom.google.common.hash.Hashing.murmur3_128(0).hashString()
murmur*Bytes
usesmurmurhash3_x64_128
passings.getBytes(StandardCharsets.UTF_8)
murmur*String
callsmurmurhash3_x64_128
passing a string directly, it also reuses the 19-byte buffermurmurAsciiOptimizedString
callsmurmurhash3_x64_128_ascii
ACSCII (1-128 characters strings)
Benchmark Mode Cnt Score Error Units
BenchString128.guavaAscii thrpt 100 38820.061 ± 767.128 ops/s
BenchString128.murmurAsciiBytes thrpt 100 79370.216 ± 1014.949 ops/s
BenchString128.murmurAsciiString thrpt 100 83249.135 ± 521.336 ops/s
BenchString128.murmurAsciiOptimizedString thrpt 100 155105.622 ± 1984.475 ops/s
Unicode (1-64 character strings)
Benchmark Mode Cnt Score Error Units
BenchString128.guavaUnicode thrpt 100 22044.362 ± 228.743 ops/s
BenchString128.murmurUnicodeBytes thrpt 100 32272.569 ± 635.764 ops/s
BenchString128.murmurUnicodeString thrpt 100 36704.326 ± 237.645 ops/s
Here are memory usage charts by different versions.
Guava
iscom.google.common.hash.Hashing.murmur3_128(0).hashString()
Bytes
ismurmurhash3_x64_128
receivings.getBytes(StandardCharsets.UTF_8)
String
ismurmurhash3_x64_128
getting a string directly and having to allocatebuf19
on every callString, providede buffer
ismurmurhash3_x64_128
getting a string directly, reusingbuf19
Simply copy to your project or get it from maven central.