panic:fatal error: concurrent map writes
Opened this issue · 2 comments
ZeroYuJie commented
I got error panic: concurrent map writes
, BPE TokenizeWithCache
func, Concurrent read and write operations on the map can lead to a panic.
func (b BPE) TokenizeWithCache(sequence string) (retVal []tokenizer.Token) {
if hit, ok := b.Cache.cmap[sequence]; ok {
return b.WordToTokens(hit)
} else {
word := b.MergeWord(sequence)
retVal = b.WordToTokens(*word)
if b.Cache != nil {
b.Cache.SetValues([]CacheItem{
{sequence, *word},
})
}
return retVal
}
}
Please check~
sugarme commented
Please share the error log detail and example how to replicate. Thanks!
ZeroYuJie commented
@sugarme
I am using this in my multi-goroutine testing. first i use this func to init model tokenizer, then I initialized a tokenizer within a global variable.
the code like this:
func OfflineLLMTokenizerInit(modelName string) (*tokenizer.Tokenizer, error) {
configFile, err := tokenizer.CachedPath(modelName, "tokenizer.json")
if err != nil {
return nil, err
}
tk, err := pretrained.FromFile(configFile)
if err != nil {
return nil, err
}
return tk, nil
}
var tk *tokenizer.Tokenizer
func main() {
tk, _ = OfflineLLMTokenizerInit("NousResearch/Redmond-Puffin-13B")
benchNum := 10000
for i := 0; i < benchNum; i++ {
go func(number int) {
//random str len = 1000
input := random.RandString(1000)
encoderSingle, _ := tk.EncodeSingle(input, false)
println(fmt.Sprintf("routine=%d,%s,len=%d", number, input, len(encoderSingle.Tokens)))
}(i)
}
time.Sleep(time.Minute)
}
then it will throw the panic:
the stack :
Because the cache b.Cache.cmap
I think the cmap should use sync.Map
or removing this cache...