boudinfl/pke

TypeError: 'NoneType' object is not iterable for a string object

farazk86 opened this issue · 11 comments

Hi,

I am getting the above error for a string that is definitely not NoneType.

I have a list of 8000 strings that I want to analyze with pke. I am using the below minimal code to do so:

!pip install git+https://github.com/boudinfl/pke.git
import nltk
nltk.download('stopwords')
import pke

total_rows = len(quick_list)
mainlist = []
mainlist_str = []

for text in quick_list:
  print(text)
  print(type(text))
  pos = {'NOUN', 'PROPN', 'ADJ'}
  extractor = pke.unsupervised.SingleRank()
  extractor.load_document(input=text,
                        language='en',
                        normalization=None)
  extractor.candidate_selection(pos=pos)
  extractor.candidate_weighting(window=10,
                              pos=pos)
  keyphrases = extractor.get_n_best(n=3)       # 3 keywords from each ticket
  sublist = []
  for k in keyphrases:
    sublist.append(k[0])
  for j in range(0, total_rows, total_rows):
    mainlist_str = ', '.join(map(str, sublist))
    mainlist.append(mainlist_str)


pke_list = mainlist

Below is the output and stack trace:

ERROR:root:No spacy model for 'en' language.
ERROR:root:A list of available spacy models is available at https://spacy.io/models.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
business world - user unable to log in
<class 'str'>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-7-027265c91041>](https://localhost:8080/#) in <module>()
     19   extractor.load_document(input=text,
     20                         language='en',
---> 21                         normalization=None)
     22   extractor.candidate_selection(pos=pos)
     23   extractor.candidate_weighting(window=10,

[/usr/local/lib/python3.7/dist-packages/pke/base.py](https://localhost:8080/#) in load_document(self, input, language, stoplist, normalization, spacy_model)
    121 
    122         else:
--> 123             for i, sentence in enumerate(self.sentences):
    124                 self.sentences[i].stems = [w.lower() for w in sentence.words]
    125 

TypeError: 'NoneType' object is not iterable

as can be seen above the text printed is not None and is of type str

I even ensured that not a single Null or None exists within my list by iterating over all elements:

NoneType = type(None)
for text in quick_list:
  if type(text) == NoneType:
    print('We have a NoneType')

The above loop does not print anything.

P.S. Maybe this is related to version 2.0 as I did not have this problem a couple of months ago.

ygorg commented

Hi,
I think your issue is related to spacy models, please check that you have downloaded the en spacy model using python -m spacy validate (and ensure that python is the same that is running pke).
Please update this thread as necessary :)

Hi @ygorg

I am still getting the same error. Below is after manually downloading the spacy model:

Installing collected packages: en-core-web-sm
  Attempting uninstall: en-core-web-sm
    Found existing installation: en-core-web-sm 2.2.5
    Uninstalling en-core-web-sm-2.2.5:
      Successfully uninstalled en-core-web-sm-2.2.5
Successfully installed en-core-web-sm-3.2.0Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
ERROR:root:No spacy model for 'en' language.
ERROR:root:A list of available spacy models is available at https://spacy.io/models.
business world - user unable to log in
<class 'str'>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-27-07fb979eea6a>](https://localhost:8080/#) in <module>()
     26   extractor.load_document(input=text,
     27                         language='en',
---> 28                         normalization=None)
     29   extractor.candidate_selection(pos=pos)
     30   extractor.candidate_weighting(window=10,

[/usr/local/lib/python3.7/dist-packages/pke/base.py](https://localhost:8080/#) in load_document(self, input, language, stoplist, normalization, spacy_model)
    121 
    122         else:
--> 123             for i, sentence in enumerate(self.sentences):
    124                 self.sentences[i].stems = [w.lower() for w in sentence.words]
    125 

TypeError: 'NoneType' object is not iterable

And below is the output of python -m spacy validate

Loaded compatibility table

================= Installed pipeline packages (spaCy v3.2.3) =================
 spaCy installation: /usr/local/lib/python3.7/dist-packages/spacy

NAME             SPACY            VERSION                            
en_core_web_sm   >=3.2.0,<3.3.0   3.2.0
ygorg commented

Hm, please try loading the file beforehand, to see whether this is linked to spacy or pke.
i'm sorry that you encounter this problem.

nlp = spacy.load('en_core_web_sm')

for text in quick_list:
    doc = nlp(doc)
    extractor = pke.unsupervised.SingleRank()
    extractor.load_document(
        input=nlp(text), language='en',
        normalization=None
    )

Hi, loading spacy beforehand appears to be the solution. The extractor is working now.

Thanks :)

Hi @ygorg, I also want to mention that I have been getting the same error since the new version(2.0), I installed pke using the instructions (spacy model == 3.2.3) in my new virtual machine, but I am getting the error "'NoneType' object is not iterable". When you load spacy model beforehand it works.
Note, I have an older virtual machine where I installed pke before (4 months ago), it does not give the error and works fine without loading spacy model beforehand.

Hello,

I can't reproduce your issue on my machine. Can you provide me with a minimal working example so I can examine further?

f.

ygorg commented

Hi, I also tried in a virtual environment and encountered no problems (see below).

Reproducing issue
python3.7 -m venv test
source test/bin/activate
pip install -U setuptools wheel pip  # problem when installing spacy
pip install git+https://github.com/boudinfl/pke.git
python -m spacy download en_core_web_sm
python <<< """
import nltk
nltk.download('stopwords')
import pke

quick_list = ['business world - user unable to log in']

for text in quick_list:
  print(text)
  print(type(text))
  pos = {'NOUN', 'PROPN', 'ADJ'}
  extractor = pke.unsupervised.SingleRank()
  extractor.load_document(input=text,
                        language='en',
                        normalization=None)
  extractor.candidate_selection(pos=pos)
  extractor.candidate_weighting(window=10,
                              pos=pos)
"""

Could you please provide the output of the following code ? If spacy.util.get_installed_models does not output the right thing then my guess is that there might be some spacy model linking issue between different version of python/spacy ?

import spacy
print(spacy.info())
print(spacy.util.get_installed_models())
nlp = spacy.load('en_core_web_sm')
print(nlp._path)  # should be the same as spacy.info()['location']

Hi, This error didn't came when i tried 3 weeks ago. Does anything get changed in the new update?. And yea, i tried loading spacy beforehand. That solves this error. But a new problem arise,

candidate_selection() got an unexpected keyword argument 'stoplist'

It is only accepting pos parameter. Is there anyway to pass stoplist ?

Documentation is still showing the old implementation.

TIA

Hi, This error didn't came when i tried 3 weeks ago. Does anything get changed in the new update?. And yea, i tried loading spacy beforehand. That solves this error. But a new problem arise,

candidate_selection() got an unexpected keyword argument 'stoplist'

It is only accepting pos parameter. Is there anyway to pass stoplist ?

Documentation is still showing the old implementation.

TIA

same here

Hello @ygorg , this is the output of your sample code:

{'spacy_version': '3.2.3', 'location': '/home/ubuntu/.local/lib/python3.6/site-packages/spacy', 'platform': 'Linux-5.4.0-1071-aws-x86_64-with-Ubuntu-18.04-bionic', 'python_version': '3.6.9', 'pipelines': {'en_core_web_sm': '3.2.0', 'en_core_web_trf': '3.2.0'}}
['en_core_web_sm', 'en_core_web_trf']
/home/ubuntu/.local/lib/python3.6/site-packages/en_core_web_sm/en_core_web_sm-3.2.0

Hi,

I just updated the docs with for pke v.2.0.

@EdwardChan5000 and @Karthick47v2, from v2.0 the stoplist parameter (e.g. list of stopwords) should be passed to the load_document() function.

Here is a minimal example:

extractor = pke.unsupervised.FirstPhrases()

extractor.load_document(input='input, language='en', stoplist=["list", "of", "stopwords"])

extractor.candidate_selection()

...

By default the stoplist from spacy that corresponds to the language parameter is used.

f.