hiDaDeng/cnsenti

使用自定义词典时的问题

AirFin opened this issue · 6 comments

您好,我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。

问题一,我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment

senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码

test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`

以下是运行结果,可以发现显示的积极词数是0。这个文本中的词,”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\User\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}

进程已结束,退出代码 0
`

问题二,我遇到了被检测文本的末尾如果有句号,则报错。以下是代码。
`from cnsenti import Sentiment

senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码

test_text = '这家公司是行业的引领者,是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`

以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array

进程已结束,退出代码 1
`

最后非常感谢您的贡献,希望得到您的帮助!