HUD-Software

Google CityHash

Table of contents

  1. Status
  2. Description
  3. Targets
  4. Usage
    1. Fetch with CMake
    2. Using portable CityHash functions
    3. Using CRC-32 intrinsic CityHash functions
  5. Performance
  6. For more informations

Status

Windows build and test

cl_x86-64 clang-cl_x86-64

Ubuntu build and test

clang_x86-64 gcc_x86-64

Quality

codecov codeql codacy

Sanitization

cl_x86-64 clang_x86-64


Description

CityHash is a HUD-Software quality compliante Google CityHash sources (SHA f5dc541).

CityHash provides hash functions for strings. Functions mix the input bits thoroughly but are not suitable for cryptography. CityHash-sys is tested on little-endian but should work on big-endian architecture.


Targets

cityhash target

This is the library target. It produce a static library called cityhash that can be used with the interfaces describes in src/cityhash directory.

cityhash_test target

This is the test executable target. It produce a test executable that performs all cityhash tests.


Usage

Fetch with CMake

include(FetchContent)
FetchContent_Declare(
    cityhash
    GIT_REPOSITORY  https://github.com/HUD-Software/cityhash.git
    GIT_TAG         /*Replace with SHA you want*/
)
FetchContent_MakeAvailable(cityhash)
target_link_libraries( my_bin_or_lib PRIVATE cityhash )

include(FetchContent)

Using portable CityHash functions


32-bit hash

#include <cityhash/city.h>

// Retrieves a 32-bit hash of a slice of bytes.
const char* LIPSUM = "...";
uint32 hash_result = CityHash32(LIPSUM, strlen(LIPSUM)); // uint64 CityHash32(const char *buf, size_t len);

64-bit hash

#include <cityhash/city.h>

// Retrieves a 64-bit hash of a slice of bytes.
const char* LIPSUM = "...";
uint64 hash_result = CityHash64(LIPSUM, strlen(LIPSUM)); // uint64 CityHash64(const char *buf, size_t len);
#include <cityhash/city.h>

// Retrieves a 64-bit hash of a slice of bytes, a seed is also hashed into the result.
const char* LIPSUM = "...";
uint64 seed = 123;
uint64 hash_result = CityHash64WithSeed(LIPSUM, strlen(LIPSUM), seed); // uint64 CityHash64WithSeed(const char *, size_t, uint64);
#include <cityhash/city.h>

// Retrieves a 64-bit hash of a slice of bytes, two seeds is also hashed into the result.
const char* LIPSUM = "...";
uint64 seed_1 = 123;
uint64 seed_2 = 456;
uint64 hash_result = CityHash64WithSeeds(LIPSUM, strlen(LIPSUM), seed_1, seed_2); // uint64 CityHash64WithSeeds(const char *, size_t, uint64, uint64);

128-bit hash

#include <cityhash/city.h>

// Retrieves a 128-bit hash of a slice of bytes.
const char* LIPSUM = "...";
uint128 hash_result = CityHash128(LIPSUM, strlen(LIPSUM)); // uint128 CityHash128(const char *s, size_t len);
#include <cityhash/city.h>

// Retrieves a 128-bit hash of a slice of bytes, a seed is also hashed into the result.
const char* LIPSUM = "...";
uint128 seed = {low, high};
uint128 hash_result = CityHash128WithSeed(LIPSUM, strlen(LIPSUM), seed); // uint128 CityHash128WithSeed(const char *s, size_t len, uint128 seed);
#include <cityhash/city.h>

// Retrieves the 64 bits of a 128 bits input.
uint128 hash_128_bits = {low, high};
uint64 hash_64_bits = Hash128to64(hash_128_bits); // uint64 Hash128to64(const uint128 &x);

Note: Depending on your compiler and hardware, it's likely faster than CityHash64() on sufficiently long strings. It's slower than necessary on shorter strings.


Using CRC-32 intrinsic CityHash functions

Some functions make usage of sse 4.2 x86_64 CRC-32 intrinsic (_mm_crc32_u64).


Caution: Be sure that your target support _mm_crc32_u64 intrinsic and that they are enabled (sse 4.2 minimum and above (avx or avx2) )

Note that depending of the length of the buffer you want to hash, it can be faster to use the non-intrinsic version. If the buffer to hash is less than 900 bytes, CityHashCrc128WithSeed and CityHashCrc128 will respectivelly internally call CityHash128WithSeed and CityHash128, in this case, it is better to call directly CityHash128WithSeed or CityHash128.


128-bit hash with CRC-32 intrinsic

#include <cityhash/citycrc.h>

// Retrieves a 128-bit hash of a slice of bytes.
const char* LIPSUM = "...";
uint128 hash_result = CityHashCrc128(LIPSUM, strlen(LIPSUM)); // uint128 CityHashCrc128(const char *s, size_t len);
#include <cityhash/citycrc.h>

// Retrieves a 128-bit hash of a slice of bytes, a seed is also hashed into the result.
const char* LIPSUM = "...";
uint128 seed = {low, high};
uint128 hash_result = CityHashCrc128WithSeed(LIPSUM, strlen(LIPSUM), seed); // uint128 CityHashCrc128WithSeed(const char *s, size_t len, uint128 seed);

256-bit hash with CRC-32 intrinsic

#include <cityhash/citycrc.h>

// Retrieves a 256-bit hash fo a slice of bytes. The hash is a slice of u64 where [0..4] is [low..high] bits.
uint64 result[4] = {0};
CityHashCrc256(LIPSUM, strlen(LIPSUM), result); // void CityHashCrc256(const char *, size_t, uint64 *);

Performance

On 64-bits hardware, CityHash is suitable for short string hashing, e.g., most hash table keys, especially CityHash64 that is faster than CityHash128.

On 32-bits hardware, CityHash is the nearest competitor of Murmur3 on x86.


For more informations

See the Google Cityhash README