AttributeError: [E046] while summarizing with PositionRank/Biased TextRank
zykerli opened this issue ยท 4 comments
Hello,
I'm trying to implement your provided PositionRank and Biased TextRank algorithms for the German language with the following code.
import spacy
spacy_model = "de_core_news_lg"
spacy_nlp = spacy.load(name=spacy_model,disable=["lemmatizer"])
spacy_nlp.add_pipe(factory_name="positionrank", name="positionrank", last=True)
text = "Das ist ein Test. Bitte fasse mich zusammen!"
import pytextrank
doc = spacy_nlp(text)
summary = list(doc._.positionalrank.summary(limit_phrases=1, limit_sentences=1, preserve_order=False))
Unfortunately, it throws some AttributeError: [E046]. It looks like the ._.positionalrank
is not implemented. The same code works fine when replacing "positionrank" with "textrank" (using doc._.textrank
). I'm using pytextrank version 3.1.1
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-117-5f8747633f63> in <module>
11
12
---> 13 summary = list(doc._.positionrank.summary(limit_phrases=1, limit_sentences=1, preserve_order=False))
14 print(summary)
~/PycharmProjects/Test_project/venv/lib/python3.8/site-packages/spacy/tokens/underscore.py in __getattr__(self, name)
30 def __getattr__(self, name):
31 if name not in self._extensions:
---> 32 raise AttributeError(Errors.E046.format(name=name))
33 default, method, getter, setter = self._extensions[name]
34 if getter is not None:
AttributeError: [E046] Can't retrieve unregistered extension attribute 'positionrank'. Did you forget to call the `set_extension` method?
EDIT
As I can see from pytextrank/pytextrank/positionrank.py
, line 23-50 (see below), PositionRank is set with set_extension, but still named as "textrank" (and not positionrank).
def __call__ (
self,
doc: Doc,
)-> Doc:
"""
Set the extension attributes on a `spaCy` [`Doc`](https://spacy.io/api/doc)
document to create a *pipeline component* for `PositionRank` as
a stateful component, invoked when the document gets processed.
See: <https://spacy.io/usage/processing-pipelines#pipelines>
doc:
a document container, providing the annotations produced by earlier stages of the `spaCy` pipeline
"""
Doc.set_extension("textrank", force=True, default=None)
Doc.set_extension("phrases", force=True, default=[])
doc._.textrank = PositionRank(
doc,
edge_weight = self.edge_weight,
pos_kept = self.pos_kept,
token_lookback = self.token_lookback,
scrubber = self.scrubber,
stopwords = self.stopwords,
)
doc._.phrases = doc._.textrank.calc_textrank()
return doc
My code at the beginning compiles when changing the last line from
summary = list(doc._.positionalrank.summary(limit_phrases=1, limit_sentences=1, preserve_order=False))
to summary = list(doc._.textrank.summary(limit_phrases=1, limit_sentences=1, preserve_order=False))
but is really PositionRank used or TextRank?
Maybe an extension of the tutorial for the algorithms beside TextRank would clarify things
Hey @dblaszcz,
Thank you for sharing the detailed description of the issue.
As you rightly figured out, applying any one for the variants "textrank", "positionrank", "biasedtextrank" attaches the extension textrank
to the doc.
That can be verified by type checking doc._.textrank
print(doc._.textrank)
# returns
# in case of textrank
# <class 'pytextrank.base.BaseTextRank'>
# in case of positionrank
# <class 'pytextrank.positionrank.PositionRank'>
# in case of biasedtextrank
# <class 'pytextrank.biasedrank.BiasedTextRank'>
Also I see in the top snippet shared by you:
This statement
import pytextrank
should be placed before
spacy_nlp.add_pipe(factory_name="positionrank", name="positionrank", last=True)
I hope it helps.
I've noticed that the pipeline extensions tend to not show up in the spaCy pipeline analysis, for example when running:
print("pipeline", nlp.pipe_names)
nlp.analyze_pipes(pretty=True)
I can raise a question on the spaCy forums to find out if there are ways to register pipeline extensions.
I see the extension in the pipeline analysis using this snippet.
import spacy
import pytextrank
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("positionrank", last=True)
assert "positionrank" in nlp.pipe_names
assert "positionrank" in nlp.analyze_pipes()['summary']
Output looks like this for me
>>> nlp.analyze_pipes(pretty=True)['summary']
============================= Pipeline Overview =============================
# Component Assigns Requires Scores Retokenizes
- --------------- ------------------- -------- ---------------- -----------
0 tok2vec doc.tensor False
1 tagger token.tag tag_acc False
2 parser token.dep dep_uas False
token.head dep_las
token.is_sent_start dep_las_per_type
doc.sents sents_p
sents_r
sents_f
3 ner doc.ents ents_f False
token.ent_iob ents_p
token.ent_type ents_r
ents_per_type
4 attribute_ruler False
5 lemmatizer token.lemma lemma_acc False
6 positionrank False
โ No problems found.
maybe it's a version issue @ceteri ? (I'm using spacy=='3.0.6' and pytextrank=='3.1.2') for this test)
Thank you @louisguitton โ
Looking at this again, since pytextrank
is assigning custom attributes then these don't show up in the pipeline analysis.