puzpuzpuz/xsync

Built-in hash function for comparable types

psyhatter opened this issue · 5 comments

Hey @puzpuzpuz
Thanks for such a cool concurrent data synchronization package

I really like the example of a hash function for a comparable structure, because there is no magic in it, math and only

xsync/example_test.go

Lines 16 to 32 in 051117f

type Person struct {
GivenName string
FamilyName string
YearOfBirth int16
}
age := xsync.NewTypedMapOf[Person, int](func(seed maphash.Seed, p Person) uint64 {
var h maphash.Hash
h.SetSeed(seed)
h.WriteString(p.GivenName)
hash := h.Sum64()
h.Reset()
h.WriteString(p.FamilyName)
hash = 31*hash + h.Sum64()
h.Reset()
binary.Write(&h, binary.LittleEndian, p.YearOfBirth)
return 31*hash + h.Sum64()
})

And this comment gave me the idea to get the built-in hashing function from the map, in a slightly tricky and insecure magical way

xsync/mapof.go

Lines 32 to 33 in 051117f

// are supported. That's because Golang standard library does not
// expose the built-in hash functions for interface{} values.

Here's what I got: https://goplay.tools/snippet/w91su4PCNMY

func main() {
	type t struct{ int }

	hash := HashFuncOf[t]()
	if hash == nil {
		log.Fatalln("hash func is nil")
	}

	seed := maphash.MakeSeed()
	a, b := hash.Sum64(seed, t{1}), hash.Sum64(seed, t{1})
	fmt.Println(a == b)

	// Output:
	// true
}

This is not safe because future versions of golang may change the internal arrangement of types.
On the other hand, it might be OK if you put this idea in a separate package like github.com/puzpuzpuz/xsync/unsafehasher and warn the user about the potential problem in the documentation or init function, like this:

// This init function will prevent the application from starting
// if the internals of golang types change in such a way that
// this package causes a panic.
func init() {
	type t struct{ bool; int; string; float64 }
	HashFuncOf[t]().Sum64(maphash.MakeSeed(), t{})
}

Also, the trick has a minus, which consists in the fact that the built-in hash function requires passing by pointer, which can negatively affect performance, but it can probably be reduced using sync.Pool

What do you think about supporting this way of creating hash functions?

And yet, it seems that this comment about string keys is superfluous here, apparently accidentally copied from the implementation of Map

xsync/mapof.go

Lines 31 to 34 in 051117f

// One important difference with sync.Map is that only string keys
// are supported. That's because Golang standard library does not
// expose the built-in hash functions for interface{} values.
type MapOf[K comparable, V any] struct {

That's a fragile, but cool hack. Maybe it could be exposed in the following way?

m := xsync.NewTypedMapOf[Person, int](UnsafeHashFuncOf[Person]())

On the other hand, I'm not sure how valuable it would be for potential users. IMO writing a hash function with hashmap is an easy task and you usually don't have a lot of maps with different struct key types in your application. WDYT?

And yet, it seems that this comment about string keys is superfluous here

Thanks, that's a leftover from one of the old version. Removed it in #72.

That's really amazing!

xsync/util.go

Lines 51 to 57 in 7784f45

// hashUint64 calculates a hash of v with the given seed.
//
//lint:ignore U1000 used in MapOf
func hashUint64(seed maphash.Seed, v uint64) uint64 {
nseed := *(*uint64)(unsafe.Pointer(&seed))
return uint64(memhash64(unsafe.Pointer(&v), uintptr(nseed)))
}

Well, maybe then make it a generic function and use it if (MapOf).hasher == nil?

Something like that:

func builtinhash[T comparable](seed maphash.Seed, v T) uint64 {
	return uint64(memhash64(unsafe.Pointer(&v), uintptr(*(*uint64)(unsafe.Pointer(&seed)))))
}

The function you're referring to assumes 64 bit size only (e.g. int64 or uint64 or uniptr on a 64-bit system). There is a separate runtime.memhash which takes an input of arbitrary length and returns its hash. The runtime uses it (and some variations) to calculate a hash of var-length types (string) and not only. I guess it's possible to use it to write a versatile hash function, but it's tricky and I didn't want to mess with internals that much. The only reason I used memhash64 is that the previous function was significantly slower than this hack.