老哥，停止词典一直不生效，加了

Question

老哥，停止词典一直不生效，加了

ColorfulDick opened this issue 3 years ago · 7 comments

package main

import (
"fmt"

"github.com/go-ego/gse"

)

var (
text = "第一次爱的人是谁演唱的"
new, _ = gse.New("dict.txt")

seg gse.Segmenter

)

func main() {
cut()
}

func cut() {
new.LoadStop("stop.txt")
new.AddStop("的")
new.AddStop("是") //加了这行也没用
fmt.Println("cut: ", new.Cut(text, true))
fmt.Println("cut all: ", new.CutAll(text))
fmt.Println("cut for search: ", new.CutSearch(text, true))
fmt.Println(new.String(text, true))
}

//控制台打印如下所示
//2022/02/18 17:44:34 Dict files path: [dict.txt]
//2022/02/18 17:44:34 Load the gse dictionary: "dict.txt"
//2022/02/18 17:44:34 Gse dictionary loaded finished.
//2022/02/18 17:44:34 Load the stop word dictionary: "stop.txt"
//cut: [第一次爱的人是谁演唱的]
//cut all: [第一次爱的人是谁演唱的]
//cut for search: [第一次爱的人是谁演唱的]
//第一次爱的人/n 是/x 谁/x 演唱/v 的/x

Answer 1 · 2022-02-18T09:47:07.000Z

加了AddStop，停止词也没生效呀

Answer 2 · 2022-02-28T14:55:03.000Z

在分词的时候并不会直接调用是否是停止词，需要使用 seg.IsStop(string) 来自行判断是否为停止词。

一般姿势是，在分词后，使用 stop作为filter对分词后的内容进行一次过滤。

Answer 3 · 2022-03-01T22:51:23.000Z

Closed because of non-standard.
Add there have seg.Trim() and seg.Stop() function.

Answer 4 · 2022-03-02T07:04:52.000Z

在分词的时候并不会直接调用是否是停止词，需要使用 seg.IsStop(string) 来自行判断是否为停止词。

一般姿势是，在分词后，使用 stop作为filter对分词后的内容进行一次过滤。

好哥哥，能在分词之前就去掉吗？如果每次分词都过滤一次，也太浪费时间了

Answer 5 · 2022-03-02T11:04:05.000Z

看上面作者回复你了可以使用 seg.Trim() 来处理下分词结果就可以了，就拿你的例子来说。

可以这样使用：

new.Trim(new.CutSearch(text, true))

过滤后，就是不带停止词的了。

Answer 6 · 2022-03-03T05:59:30.000Z

Closed because of non-standard. Add there have seg.Trim() and seg.Stop() function.

oh！THANKS！

Answer 7 · 2022-03-03T05:59:45.000Z

看上面作者回复你了可以使用 seg.Trim() 来处理下分词结果就可以了，就拿你的例子来说。

可以这样使用：
new.Trim(new.CutSearch(text, true))
过滤后，就是不带停止词的了。

谢谢老哥