miso-belica/sumy

Unable to run sumy in Jupyter Notebook

Opened this issue · 0 comments

I have been trying without success to get sumy to work in Jupyter Notebook. But it is always throwing error for the Tokenizer.

Here is my Jupyter Notebook code:

!python -c "import nltk; nltk.download('stopwords')"

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

text = "Your long text here..."
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = LsaSummarizer()
summary = summarizer(parser.document, 3)  # Summarize to 3 sentences

for sentence in summary:
    print(sentence)

When I run this code I get the following error:


UnpicklingError                           Traceback (most recent call last)
Cell In[22], line 6
      3 from sumy.summarizers.lsa import LsaSummarizer
      5 text = "Your long text here..."
----> 6 parser = PlaintextParser.from_string(text, Tokenizer("english"))
      7 summarizer = LsaSummarizer()
      8 summary = summarizer(parser.document, 3)  # Summarize to 3 sentences

File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:160, in Tokenizer.__init__(self, language)
    157 self._language = language
    159 tokenizer_language = self.LANGUAGE_ALIASES.get(language, language)
--> 160 self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
    161 self._word_tokenizer = self._get_word_tokenizer(tokenizer_language)

File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:172, in Tokenizer._get_sentence_tokenizer(self, language)
    170 try:
    171     path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
--> 172     return nltk.data.load(path)
    173 except (LookupError, zipfile.BadZipfile) as e:
    174     raise LookupError(
    175         "NLTK tokenizers are missing or the language is not supported.\n"
    176         """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n"""
    177         "Original error was:\n" + str(e)
    178     )

What can I do to fix this issue?