lithammer/go-jump-consistent-hash

[bug]: The result of hash string is not consistent when running in multiple goroutines

baolinCloud opened this issue · 6 comments

I tested hash "123456789" with fnv.New64a and 2 buckets, the expected res is 0, but in goroutine sometimes res is 1.

func TestHashString(t *testing.T) {
	hasher := New(2, fnv.New64a())
	input := "123456789"
	errCount := make(chan int)
	for i := 0; i < 1000; i++ {
		go func(i int) {
			res := hasher.Hash(input)
			if res != 0 {
				errCount <- 1
				t.Errorf("hash %d error: res is %d", i, res)
			}
		}(i)
	}

	s := 0
	go func() {
		for j := range errCount {
			s += j
		}
	}()

	time.Sleep(1 * time.Second)
	fmt.Println("errCount:", s)
}

I think the HashString function is not goroutine safe.

This is a tricky one to solve 🤔 To work around it I guess you have to prevent concurrent usage by creating a new hasher inside the loop like so:

func TestHashString(t *testing.T) {
-	hasher := New(2, fnv.New64a())
	input := "123456789"
	errCount := make(chan int)
	for i := 0; i < 1000; i++ {
		go func(i int) {
+			hasher := New(2, fnv.New64a())
			res := hasher.Hash(input)
			if res != 0 {
				errCount <- 1
				t.Errorf("hash %d error: res is %d", i, res)
			}
		}(i)
	}

	s := 0
	go func() {
		for j := range errCount {
			s += j
		}
	}()

	time.Sleep(1 * time.Second)
	fmt.Println("errCount:", s)
}

Hopefully fixed by GH-10

I use the function:

func HashString(key string, buckets int) int32 {
	h := fnv.New64a()
	h.Reset()
	_, err := io.WriteString(h, key)
	if err != nil {
		return 0
	}
	return Hash(h.Sum64(), int32(buckets))
}

func BenchmarkHashStringFNV1a(b *testing.B) {
	s := "123456789"
	buckets := 2
	for i := 0; i < b.N; i++ {
		HashString(s, buckets)
	}
}

Result:

BenchmarkHashStringFNV1a
BenchmarkHashStringFNV1a-12    	17144026	        70.3 ns/op

I tested the mutex, little lower.

func BenchmarkHashStringFNV1a2(b *testing.B) {
	s := "123456789"
	hasher := New(2, FNV1a)
	for i := 0; i < b.N; i++ {
		hasher.Hash(s)
	}
}

benchmark with mutex:

BenchmarkHashStringFNV1a2
BenchmarkHashStringFNV1a2-12    	15999657	        79.8 ns/op

I can only think of two solutions for this problem though:

  • Either use a mutex to prevent buffer corruption of the underlying hash.Hash object.
  • Always create a new instance of the hash.Hash/KeyHasher object.

Note that this is only a problem when sharing a hasher between goroutines. Ideally I would like to change the function signature of HashString like below. It would fix some of the usage errors, but it would also break compatibility which I'm not super keen on:

-func HashString(key string, buckets int32, h KeyHasher) int32 {
+func HashString(key string, buckets int32, h func() KeyHasher) int32 {

Also, try using jump.NewFNV1a() instead of fnv.New64a() since it's just a thread safe wrapper around the latter.

Duplicate of #6