go-ego/gse

Split “ 2021年09月10日”. I want got "2021年 / 09月 / 10日"

mkdreams opened this issue · 1 comments

Split “ 2021年09月10日”. I want got "2021年 / 09月 / 10日".
Split “ 중국 규제 리스크에 울고 웃는 종목들현명하게 대응하려면”. I want got "중국 / 규제 / 리스크에...".

I use code
words := seg.Cut("2021年09月10日", true)

I change this code of this package.But I think it is not good.Is there a better way?
File: hmm_seg.go line: 27
regSkip = regexp.MustCompile(`(\d+年|\d+月|\d+日|[\p{Latin}]+|[\p{Hangul}]+|\d+\.\d+|[a-zA-Z0-9]+)`)

  • Gse version (or commit ref):v0.69.3
  • Go version:1.16

I added CutDAG() with regexp support, and you can solved #114 too.

reg := regexp.MustCompile(`(\d+年|\d+月|\d+日|[\p{Latin}]+|[\p{Hangul}]+|\d+\.\d+|[a-zA-Z0-9]+)`)
text := `헬로월드 헬로 서울, 2021年09月10日, 3.14`
hmm = seg.CutDAG(text, reg)