Unable to run sumy in Jupyter Notebook
azamsharpschool opened this issue · 2 comments
azamsharpschool commented
I have been trying without success to get sumy to work in Jupyter Notebook. But it is always throwing error for the Tokenizer.
Here is my Jupyter Notebook code:
!python -c "import nltk; nltk.download('stopwords')"
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer
text = "Your long text here..."
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = LsaSummarizer()
summary = summarizer(parser.document, 3) # Summarize to 3 sentences
for sentence in summary:
print(sentence)
When I run this code I get the following error:
UnpicklingError Traceback (most recent call last)
Cell In[22], line 6
3 from sumy.summarizers.lsa import LsaSummarizer
5 text = "Your long text here..."
----> 6 parser = PlaintextParser.from_string(text, Tokenizer("english"))
7 summarizer = LsaSummarizer()
8 summary = summarizer(parser.document, 3) # Summarize to 3 sentences
File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:160, in Tokenizer.__init__(self, language)
157 self._language = language
159 tokenizer_language = self.LANGUAGE_ALIASES.get(language, language)
--> 160 self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
161 self._word_tokenizer = self._get_word_tokenizer(tokenizer_language)
File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:172, in Tokenizer._get_sentence_tokenizer(self, language)
170 try:
171 path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
--> 172 return nltk.data.load(path)
173 except (LookupError, zipfile.BadZipfile) as e:
174 raise LookupError(
175 "NLTK tokenizers are missing or the language is not supported.\n"
176 """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n"""
177 "Original error was:\n" + str(e)
178 )
What can I do to fix this issue?