[bug]: The result of hash string is not consistent when running in multiple goroutines
baolinCloud opened this issue · 6 comments
go-jump-consistent-hash/jump.go
Line 33 in 7f6b38d
I tested hash "123456789" with fnv.New64a and 2 buckets, the expected res is 0, but in goroutine sometimes res is 1.
func TestHashString(t *testing.T) {
hasher := New(2, fnv.New64a())
input := "123456789"
errCount := make(chan int)
for i := 0; i < 1000; i++ {
go func(i int) {
res := hasher.Hash(input)
if res != 0 {
errCount <- 1
t.Errorf("hash %d error: res is %d", i, res)
}
}(i)
}
s := 0
go func() {
for j := range errCount {
s += j
}
}()
time.Sleep(1 * time.Second)
fmt.Println("errCount:", s)
}
I think the HashString function is not goroutine safe.
This is a tricky one to solve 🤔 To work around it I guess you have to prevent concurrent usage by creating a new hasher inside the loop like so:
func TestHashString(t *testing.T) {
- hasher := New(2, fnv.New64a())
input := "123456789"
errCount := make(chan int)
for i := 0; i < 1000; i++ {
go func(i int) {
+ hasher := New(2, fnv.New64a())
res := hasher.Hash(input)
if res != 0 {
errCount <- 1
t.Errorf("hash %d error: res is %d", i, res)
}
}(i)
}
s := 0
go func() {
for j := range errCount {
s += j
}
}()
time.Sleep(1 * time.Second)
fmt.Println("errCount:", s)
}
I use the function:
func HashString(key string, buckets int) int32 {
h := fnv.New64a()
h.Reset()
_, err := io.WriteString(h, key)
if err != nil {
return 0
}
return Hash(h.Sum64(), int32(buckets))
}
func BenchmarkHashStringFNV1a(b *testing.B) {
s := "123456789"
buckets := 2
for i := 0; i < b.N; i++ {
HashString(s, buckets)
}
}
Result:
BenchmarkHashStringFNV1a
BenchmarkHashStringFNV1a-12 17144026 70.3 ns/op
I tested the mutex, little lower.
func BenchmarkHashStringFNV1a2(b *testing.B) {
s := "123456789"
hasher := New(2, FNV1a)
for i := 0; i < b.N; i++ {
hasher.Hash(s)
}
}
benchmark with mutex:
BenchmarkHashStringFNV1a2
BenchmarkHashStringFNV1a2-12 15999657 79.8 ns/op
I can only think of two solutions for this problem though:
- Either use a mutex to prevent buffer corruption of the underlying
hash.Hash
object. - Always create a new instance of the
hash.Hash
/KeyHasher
object.
Note that this is only a problem when sharing a hasher between goroutines. Ideally I would like to change the function signature of HashString
like below. It would fix some of the usage errors, but it would also break compatibility which I'm not super keen on:
-func HashString(key string, buckets int32, h KeyHasher) int32 {
+func HashString(key string, buckets int32, h func() KeyHasher) int32 {
Also, try using jump.NewFNV1a()
instead of fnv.New64a()
since it's just a thread safe wrapper around the latter.