使用自定义词典时的问题

Question

使用自定义词典时的问题

AirFin opened this issue 5 years ago · 6 comments

您好，我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。

问题一，我遇到了不识别词典的问题。以下是代码。
`from cnsenti import Sentiment

senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码

test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`

以下是运行结果，可以发现显示的积极词数是0。这个文本中的词，”引领者“和”中流砥柱“均是词典中的积极词。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\User\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0}
Prefix dict has been built succesfully.
sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0}

进程已结束，退出代码 0
`

问题二，我遇到了被检测文本的末尾如果有句号，则报错。以下是代码。
`from cnsenti import Sentiment

senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径
neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径
encoding='unicode_escape') #两txt均为utf-8编码

test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。'
result1 = senti.sentiment_count(test_text)
result2 = senti.sentiment_calculate(test_text)
print('sentiment_count',result1)
print('sentiment_calculate',result2)`

以下是运行结果。
`D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache
Loading model cost 0.685 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in
result2 = senti.sentiment_calculate(test_text)
File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate
pos = np.sum(score_array[:, 0])
IndexError: too many indices for array

进程已结束，退出代码 1
`

最后非常感谢您的贡献，希望得到您的帮助！

Answer 1 · 2020-04-12T03:00:32.000Z

刚看了下你写的代码 senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码有两种可能的原因 1. pos和neg分别是积极和消极，参数你传递的不对。 2. encoding参数用来接收自定义词典的编码格式，如果你的txt是utf-8，encoding='utf-8'即可。unicode_escape我不太了解，不晓得是否也会导致问题 3. cnsenti用的jieba分词，不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分，后续会改进的

…

------------------ 原始邮件 ------------------ 发件人: "DesmondLiu"<notifications@github.com>; 发送时间: 2020年4月12日(星期天) 上午10:48 收件人: "thunderhit/cnsenti"<cnsenti@noreply.github.com>; 抄送: "Subscribed"<subscribed@noreply.github.com>; 主题: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 您好，我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。问题一，我遇到了不识别词典的问题。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果，可以发现显示的积极词数是0。这个文本中的词，”引领者“和”中流砥柱“均是词典中的积极词。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.675 seconds. sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0} Prefix dict has been built succesfully. sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0} 进程已结束，退出代码 0 ` 问题二，我遇到了被检测文本的末尾如果有句号，则报错。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.685 seconds. Prefix dict has been built succesfully. Traceback (most recent call last): File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in result2 = senti.sentiment_calculate(test_text) File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 ` 最后非常感谢您的贡献，希望得到您的帮助！ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Answer 2 · 2020-04-12T03:08:20.000Z

非常感谢您的回复！ 1.正负词我弄反了 2.我改为了“utf-8" 但是依然是不识别情感词。此外，关于第二个问题，即“被分析文本中有句号”时报错，请问有什么办法解决呢？代码如下。 from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\正面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\负面词典.txt", #负面词典txt文件相对路径 encoding='utf-8') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2) 运行结果如下 D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache Loading model cost 0.739 seconds. Prefix dict has been built succesfully. Traceback (most recent call last):   File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in <module>     result2 = senti.sentiment_calculate(test_text)   File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate     pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 非常感谢您的帮助！

…

------------------ 原始邮件 ------------------ 发件人: "thunderhit"<notifications@github.com>; 发送时间: 2020年4月12日(星期天) 中午11:00 收件人: "thunderhit/cnsenti"<cnsenti@noreply.github.com>; 抄送: ""<1016550222@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 刚看了下你写的代码 senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码有两种可能的原因 1. pos和neg分别是积极和消极，参数你传递的不对。 2. encoding参数用来接收自定义词典的编码格式，如果你的txt是utf-8，encoding='utf-8'即可。unicode_escape我不太了解，不晓得是否也会导致问题 3. cnsenti用的jieba分词，不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分，后续会改进的

------------------&nbsp;原始邮件&nbsp;------------------ 发件人:&nbsp;"DesmondLiu"<notifications@github.com&gt;; 发送时间:&nbsp;2020年4月12日(星期天) 上午10:48 收件人:&nbsp;"thunderhit/cnsenti"<cnsenti@noreply.github.com&gt;; 抄送:&nbsp;"Subscribed"<subscribed@noreply.github.com&gt;; 主题:&nbsp;[thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 您好，我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。问题一，我遇到了不识别词典的问题。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果，可以发现显示的积极词数是0。这个文本中的词，”引领者“和”中流砥柱“均是词典中的积极词。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache Loading model cost 0.675 seconds. sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0} Prefix dict has been built succesfully. sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0} 进程已结束，退出代码 0 ` 问题二，我遇到了被检测文本的末尾如果有句号，则报错。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\AppData\Local\Temp\jieba.cache Loading model cost 0.685 seconds. Prefix dict has been built succesfully. Traceback (most recent call last): File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in result2 = senti.sentiment_calculate(test_text) File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 ` 最后非常感谢您的贡献，希望得到您的帮助！ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Answer 3 · 2020-04-12T03:31:10.000Z

第一个大问题，我这里代码跑了，识别了中流砥柱，没什么问题。你留的第二个大问题，我正在解决。

…

---原始邮件--- 发件人: "DesmondLiu"<notifications@github.com> 发送时间: 2020年4月12日(周日) 中午11:08 收件人: "thunderhit/cnsenti"<cnsenti@noreply.github.com>; 抄送: "Comment"<comment@noreply.github.com>;"thunderhit"<thunderhit@qq.com>; 主题: Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 非常感谢您的回复！ 1.正负词我弄反了 2.我改为了“utf-8" 但是依然是不识别情感词。此外，关于第二个问题，即“被分析文本中有句号”时报错，请问有什么办法解决呢？代码如下。 from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\正面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\负面词典.txt", #负面词典txt文件相对路径 encoding='utf-8') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2) 运行结果如下 D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.739 seconds. Prefix dict has been built succesfully. Traceback (most recent call last): &nbsp; File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in <module&gt; &nbsp; &nbsp; result2 = senti.sentiment_calculate(test_text) &nbsp; File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate &nbsp; &nbsp; pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 非常感谢您的帮助！

------------------&nbsp;原始邮件&nbsp;------------------ 发件人:&nbsp;"thunderhit"<notifications@github.com&gt;; 发送时间:&nbsp;2020年4月12日(星期天) 中午11:00 收件人:&nbsp;"thunderhit/cnsenti"<cnsenti@noreply.github.com&gt;; 抄送:&nbsp;"刘铭基"<1016550222@qq.com&gt;;"Author"<author@noreply.github.com&gt;; 主题:&nbsp;Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 刚看了下你写的代码 senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码有两种可能的原因 1. pos和neg分别是积极和消极，参数你传递的不对。 2. encoding参数用来接收自定义词典的编码格式，如果你的txt是utf-8，encoding='utf-8'即可。unicode_escape我不太了解，不晓得是否也会导致问题 3. cnsenti用的jieba分词，不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分，后续会改进的

------------------&amp;nbsp;原始邮件&amp;nbsp;------------------ 发件人:&amp;nbsp;"DesmondLiu"<notifications@github.com&amp;gt;; 发送时间:&amp;nbsp;2020年4月12日(星期天) 上午10:48 收件人:&amp;nbsp;"thunderhit/cnsenti"<cnsenti@noreply.github.com&amp;gt;; 抄送:&amp;nbsp;"Subscribed"<subscribed@noreply.github.com&amp;gt;; 主题:&amp;nbsp;[thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 您好，我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。问题一，我遇到了不识别词典的问题。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果，可以发现显示的积极词数是0。这个文本中的词，”引领者“和”中流砥柱“均是词典中的积极词。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.675 seconds. sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0} Prefix dict has been built succesfully. sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0} 进程已结束，退出代码 0 ` 问题二，我遇到了被检测文本的末尾如果有句号，则报错。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.685 seconds. Prefix dict has been built succesfully. Traceback (most recent call last): File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in result2 = senti.sentiment_calculate(test_text) File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 ` 最后非常感谢您的贡献，希望得到您的帮助！ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Answer 4 · 2020-04-12T03:32:27.000Z

好的。再次感谢您！发自我的iPhone

…

------------------ 原始邮件 ------------------ 发件人: thunderhit <notifications@github.com> 发送时间: 2020年4月12日 11:31 收件人: thunderhit/cnsenti <cnsenti@noreply.github.com> 抄送: DesmondLiu <1016550222@qq.com>, Author <author@noreply.github.com> 主题: 回复：[thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 第一个大问题，我这里代码跑了，识别了中流砥柱，没什么问题。你留的第二个大问题，我正在解决。

---原始邮件--- 发件人: "DesmondLiu"<notifications@github.com&gt; 发送时间: 2020年4月12日(周日) 中午11:08 收件人: "thunderhit/cnsenti"<cnsenti@noreply.github.com&gt;; 抄送: "Comment"<comment@noreply.github.com&gt;;"thunderhit"<thunderhit@qq.com&gt;; 主题: Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 非常感谢您的回复！ 1.正负词我弄反了 2.我改为了“utf-8" 但是依然是不识别情感词。此外，关于第二个问题，即“被分析文本中有句号”时报错，请问有什么办法解决呢？代码如下。 from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\正面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\负面词典.txt", #负面词典txt文件相对路径 encoding='utf-8') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2) 运行结果如下 D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.739 seconds. Prefix dict has been built succesfully. Traceback (most recent call last): &amp;nbsp; File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in <module&amp;gt; &amp;nbsp; &amp;nbsp; result2 = senti.sentiment_calculate(test_text) &amp;nbsp; File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate &amp;nbsp; &amp;nbsp; pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 非常感谢您的帮助！

------------------&amp;nbsp;原始邮件&amp;nbsp;------------------ 发件人:&amp;nbsp;"thunderhit"<notifications@github.com&amp;gt;; 发送时间:&amp;nbsp;2020年4月12日(星期天) 中午11:00 收件人:&amp;nbsp;"thunderhit/cnsenti"<cnsenti@noreply.github.com&amp;gt;; 抄送:&amp;nbsp;"刘铭基"<1016550222@qq.com&amp;gt;;"Author"<author@noreply.github.com&amp;gt;; 主题:&amp;nbsp;Re: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 刚看了下你写的代码 senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码有两种可能的原因 1. pos和neg分别是积极和消极，参数你传递的不对。 2. encoding参数用来接收自定义词典的编码格式，如果你的txt是utf-8，encoding='utf-8'即可。unicode_escape我不太了解，不晓得是否也会导致问题 3. cnsenti用的jieba分词，不排除jieba分词分错新情感词。这块我cnsenti中没有开发这部分，后续会改进的

------------------&amp;amp;nbsp;原始邮件&amp;amp;nbsp;------------------ 发件人:&amp;amp;nbsp;"DesmondLiu"<notifications@github.com&amp;amp;gt;; 发送时间:&amp;amp;nbsp;2020年4月12日(星期天) 上午10:48 收件人:&amp;amp;nbsp;"thunderhit/cnsenti"<cnsenti@noreply.github.com&amp;amp;gt;; 抄送:&amp;amp;nbsp;"Subscribed"<subscribed@noreply.github.com&amp;amp;gt;; 主题:&amp;amp;nbsp;[thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 您好，我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。问题一，我遇到了不识别词典的问题。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果，可以发现显示的积极词数是0。这个文本中的词，”引领者“和”中流砥柱“均是词典中的积极词。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.675 seconds. sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0} Prefix dict has been built succesfully. sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0} 进程已结束，退出代码 0 ` 问题二，我遇到了被检测文本的末尾如果有句号，则报错。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.685 seconds. Prefix dict has been built succesfully. Traceback (most recent call last): File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in result2 = senti.sentiment_calculate(test_text) File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 ` 最后非常感谢您的贡献，希望得到您的帮助！ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Answer 5 · 2020-04-12T04:13:31.000Z

我更新了代码，你可以卸载cnsenti   过半个小时再安装新的cnsenti 这是cnsenti文档，已经解决多个句子问题； https://github.com/thunderhit/cnsenti/blob/master/README.md

…

------------------ 原始邮件 ------------------ 发件人: "DesmondLiu"<notifications@github.com>; 发送时间: 2020年4月12日(星期天) 上午10:48 收件人: "thunderhit/cnsenti"<cnsenti@noreply.github.com>; 抄送: "Subscribed"<subscribed@noreply.github.com>; 主题: [thunderhit/cnsenti] 使用自定义词典时的问题 (#1) 您好，我在使用“通过自定义词典”来进行情感分析的过程中遇到些问题。想向您请教。问题一，我遇到了不识别词典的问题。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果，可以发现显示的积极词数是0。这个文本中的词，”引领者“和”中流砥柱“均是词典中的积极词。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.675 seconds. sentiment_count {'words': 15, 'sentences': 2, 'pos': 0, 'neg': 0} Prefix dict has been built succesfully. sentiment_calculate {'sentences': 2, 'words': 15, 'pos': 0, 'neg': 0} 进程已结束，退出代码 0 ` 问题二，我遇到了被检测文本的末尾如果有句号，则报错。以下是代码。 `from cnsenti import Sentiment senti = Sentiment(pos=r"D:\情感词典\负面词典.txt", #正面词典txt文件相对路径 neg=r"D:\情感词典\正面词典.txt", #负面词典txt文件相对路径 encoding='unicode_escape') #两txt均为utf-8编码 test_text = '这家公司是行业的引领者，是中流砥柱。今年的业绩非常好。' result1 = senti.sentiment_count(test_text) result2 = senti.sentiment_calculate(test_text) print('sentiment_count',result1) print('sentiment_calculate',result2)` 以下是运行结果。 `D:\Anaconda3\python.exe D:/pythonproject/test1/金融文本情感-中文-cnsenti.py Building prefix dict from the default dictionary ... Loading model from cache C:\Users\刘铭基\AppData\Local\Temp\jieba.cache Loading model cost 0.685 seconds. Prefix dict has been built succesfully. Traceback (most recent call last): File "D:/pythonproject/test1/金融文本情感-中文-cnsenti.py", line 9, in result2 = senti.sentiment_calculate(test_text) File "D:\Anaconda3\lib\site-packages\cnsenti\sentiment.py", line 218, in sentiment_calculate pos = np.sum(score_array[:, 0]) IndexError: too many indices for array 进程已结束，退出代码 1 ` 最后非常感谢您的贡献，希望得到您的帮助！ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Answer 6 · 2020-04-12T04:14:45.000Z

好的。非常感谢您！祝您工作顺利！发自我的iPhone

…

------------------ 原始邮件 ------------------ 发件人: thunderhit <notifications@github.com> 发送时间: 2020年4月12日 12:13 收件人: thunderhit/cnsenti <cnsenti@noreply.github.com> 抄送: DesmondLiu <1016550222@qq.com>, Author <author@noreply.github.com> 主题: 回复：[thunderhit/cnsenti] 使用自定义词典时的问题 (#1)