sentence can choose tolower or keep origin sentence?
Opened this issue · 3 comments
ivory2406 commented
hello, I want to keep uppercase letter。 like example:
text := "Hello world, Helloworld. Winter is coming! 你好世界."
jieba := new(gse.Segmenter)
jieba.LoadDict()
res := jieba.Cut(text)
println(ToJson(res))
}
the result is : ["hello"," ","world",","," ","helloworld","."," ","winter"," ","is"," ","coming","!"," ","你好","世界","."]
I hope the result is ["Hello"," ","world",","," ","Helloworld","."," ","Winter"," ","is"," ","coming","!"," ","你好","世界","."]
And I have seen the option params: https://github.com/go-ego/gse/blob/master/segmenter.go
ivory2406 commented
ivory2406 commented
@CocaineCong hello, Could you help me with the option param toLower? bacause i want to use this gse for tokenize sentences and then use mmh3 to encode tokens.
the character is lowercase or uppercase, it's very important to me.
Because words mmh3 value are different when they are lowercase or uppercase.