Package chinese provides utilities for dealing with Chinese text, including text segmentation.
Download:
go get github.com/smhanov/chinese
Package chinese provides utilities for dealing with Chinese text, including text segmentation.
Chinese text is commonly written without any spaces between the words. This package uses the viterbi algorithm and word frequency information to find the best placement of spaces in the sentences.
It is designed to take up very little memory. In my tests, loading the default dictionary will use 160MB of RAM. However, the memory used for loading is then immediately released so the total memory consumed for the dictionary of 589000 words and frequencies is 1.1MB
To use it, create a new text segmenter. By default, a model of word frequencies from the web is loaded. Then call Segment() passing in some text. The return value is the text split into strings containing individual words, unrecognized words, or spaces and punctuation. You can get back the original input by concatenating the results together.
Automatically generated by autoreadme on 2019.04.08