2 test cases failing with spacy latest version(3.1.0)
Ankush-Chander opened this issue · 3 comments
tests/test_base.py:180: AssertionError
=========================== short test summary info ============================
FAILED tests/test_base.py::test_summary - assert [[0, [0, 9, 2... [9, 4]], .....
FAILED tests/test_base.py::test_stop_words - AssertionError: assert ['sentenc...
========================= 2 failed, 6 passed in 2.67s ==========================
To add to this, here is the full verbosity output:
(.env39) PS C:\GitHub\pytextrank> pytest -vvv
==================================================================================== test session starts ====================================================================================
platform win32 -- Python 3.9.4, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- c:\github\pytextrank\.env39\scripts\python.exe
cachedir: .pytest_cache
rootdir: C:\GitHub\pytextrank
collected 8 items
tests/test_base.py::test_base_text_rank PASSED [ 12%]
tests/test_base.py::test_add_pipe PASSED [ 25%]
tests/test_base.py::test_summary FAILED [ 37%]
tests/test_base.py::test_multiple_summary PASSED [ 50%]
tests/test_base.py::test_stop_words FAILED [ 62%]
tests/test_biasedrank.py::test_default_biased_rank PASSED [ 75%]
tests/test_biasedrank.py::test_biased_rank PASSED [ 87%]
tests/test_positionrank.py::test_position_rank PASSED [100%]
========================================================================================= FAILURES ==========================================================================================
_______________________________________________________________________________________ test_summary ________________________________________________________________________________________
nlp = <spacy.lang.en.English object at 0x00000285FCB914C0>
def test_summary (nlp: Language):
"""
Summarization produces the expected results.
"""
# given
expected_trace = [
[0, [0, 2, 6, 7, 8]],
[1, [8]],
[2, [2]],
[7, [8, 4]],
[8, [8]],
[11, [2]],
[12, [1]],
[14, [2, 5]],
[15, [9, 3, 7]],
[17, [2]],
]
with open("dat/lee.txt", "r") as f:
text = f.read()
doc = nlp(text)
tr = doc._.textrank
# calculates *unit vector* and sentence distance measures
# when
trace = [
[ sent_dist.sent_id, list(sent_dist.phrases) ]
for sent_dist in tr.calc_sent_dist(limit_phrases=10)
if not sent_dist.empty()
]
# then
> assert trace == expected_trace
E assert [[0, [0, 9, 2, 6]],\n [1, [9]],\n [2, [2]],\n [3, [7]],\n [4, [8]],\n [6, [9, 4]],\n [7, [9]],\n [9, [2]],\n [10, [1]],\n [12, [2, 5]],\n [13, [3]],\n [15, [2]]] == [[0, [0, 2, 6, 7, 8]],\n [1, [8]],\n [2, [2]],\n [7, [8, 4]],\n [8, [8]],\n [11, [2]],\n [12, [1]],\n [14, [2, 5]],\n [15, [9, 3, 7]],\n [17, [2]]]
E At index 0 diff: [0, [0, 9, 2, 6]] != [0, [0, 2, 6, 7, 8]]
E Left contains 2 more items, first extra item: [13, [3]]
E Full diff:
E [
E [0,
E [0,
E + 9,
E 2,
E - 6,
E - 7,
E - 8]],
E ? ^
E + 6]],
E ? ^
E [1,
E - [8]],
E ? ^
E + [9]],
E ? ^
E [2,
E [2]],
E + [3,
E + [7]],
E + [4,
E + [8]],
E + [6,
E + [9,
E + 4]],
E [7,
E - [8,
E - 4]],
E - [8,
E - [8]],
E ? ^
E + [9]],
E ? ^
E - [11,
E + [9,
E [2]],
E + [10,
E + [1]],
E [12,
E - [1]],
E - [14,
E [2,
E 5]],
E + [13,
E + [3]],
E [15,
E - [9,
E - 3,
E - 7]],
E - [17,
E [2]],
E ]
tests\test_base.py:116: AssertionError
______________________________________________________________________________________ test_stop_words ______________________________________________________________________________________
def test_stop_words ():
"""
Works as a pipeline component and can be disabled.
"""
# given
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank", config={ "stopwords": { "word": ["NOUN"] } })
# when
_expected_phrases = [
"sentences",
"Mihalcea et al",
"text summarization",
"gensim implements",
"Okapi BM25 function",
]
# add `"word": ["NOUN"]` to the *stop words*, to remove instances
# of `"word"` or `"words"` then see how the ranked phrases differ?
# then
with open("dat/gen.txt", "r") as f:
doc = nlp(f.read())
phrases = [
phrase.text
for phrase in doc._.phrases[:5]
]
> assert phrases == _expected_phrases
E AssertionError: assert ['sentences',\n 'Mihalcea et al',\n 'et al',\n 'Barrios et al',\n 'gensim implements TextRank'] == ['sentences',\n 'Mihalcea et al',\n 'text summarization',\n 'gensim implements',\n 'Okapi BM25 function']
E At index 2 diff: 'et al' != 'text summarization'
E Full diff:
E [
E 'sentences',
E 'Mihalcea et al',
E - 'text summarization',
E + 'et al',
E + 'Barrios et al',
E - 'gensim implements',
E + 'gensim implements TextRank',
E ? +++++++++
E - 'Okapi BM25 function',
E ]
tests\test_base.py:180: AssertionError
================================================================================== short test summary info ==================================================================================
FAILED tests/test_base.py::test_summary - assert [[0, [0, 9, 2, 6]],\n [1, [9]],\n [2, [2]],\n [3, [7]],\n [4, [8]],\n [6, [9, 4]],\n [7, [9]],\n [9, [2]],\n [10, [1]],\n [12, [2, 5]],\n ...FAILED tests/test_base.py::test_stop_words - AssertionError: assert ['sentences',\n 'Mihalcea et al',\n 'et al',\n 'Barrios et al',\n 'gensim implements TextRank'] == ['sentences',\n 'Mih...================================================================================ 2 failed, 6 passed in 2.36s ================================================================================
Edit:
This seems to be related to a change in doc.noun_chunks
between spacy 3.0.0 and spacy 3.1.0. I've ran the following program with both versions of spacy to verify this:
import spacy # pylint: disable=E0401
from pprint import pprint
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank", config={ "stopwords": { "word": ["NOUN"] } })
with open("dat/gen.txt", "r") as f:
doc = nlp(f.read())
pprint(list(doc.noun_chunks))
See the diff of the outputs here. The raw outputs can be seen below:
Spacy 3.1.0
[First, a quick description,
some popular algorithms,
implementations,
text summarization,
the summarization module,
gensim implements TextRank,
an unsupervised algorithm,
weighted-graphs,
a paper,
Mihalcea,
et al,
It,
another incubator student,
Olavur Mortensen â€,
his previous post,
this blog,
It,
top,
the popular PageRank algorithm,
Google,
ranking webpages,
TextRank,
Pre,
-,
the text,
words,
the remaining words,
a graph,
vertices,
sentences,
every sentence,
every other sentence,
an edge,
The weight,
the edge,
the two sentences,
the PageRank,
algorithm,
the graph,
the vertices(sentences,
the highest PageRank score,
original TextRank,
the weights,
an edge,
two sentences,
the percentage,
words,
them,
Gensim’s TextRank,
Okapi BM25 function,
the sentences,
It,
an improvement,
a paper,
Barrios et al]
Spacy 3.0.0
[First, a quick description,
some popular algorithms,
implementations,
text summarization,
the summarization module,
gensim implements,
TextRank,
an unsupervised algorithm,
weighted-graphs,
a paper,
Mihalcea et al,
It,
another incubator student,
Olavur Mortensen â€,
his previous post,
this blog,
It,
top,
the popular PageRank algorithm,
Google,
webpages,
TextRank,
-,
the text,
words,
the remaining words,
a graph,
vertices,
sentences,
every sentence,
every other sentence,
an edge,
The weight,
the edge,
the two sentences,
the PageRank algorithm,
the graph,
the vertices(sentences,
the highest PageRank score,
original TextRank,
the weights,
an edge,
two sentences,
the percentage,
words,
them,
Gensimâ€,
TextRank,
Okapi BM25 function,
the sentences,
It,
an improvement,
a paper,
Barrios,
et al]
This is doubtlessly the cause of the failing tests.
Considering PTR should preferably support spacy 3.1.0 as well as older versions, you would ideally have tests that work with whichever version of spacy is used. However, this seems to require having different expected outcomes for each range of spacy versions. I'm sure that this can be done without much hastle.
- Tom Aarsen
I have handled stopword_test
to be agnostic of spacy version here.
Summary test is still tricky and volatile to change in spacy model upgrades.
One alternative I have in mind is perhaps only test parameters like limit_sentences
, preserve_order
and change in sentence ordering
instead of exact order of sentences
.
Nice work @tomaarsen and @Ankush-Chander!
It's good to see the improvements with the default spaCy
noun chunker, and I suspect that we'll see more like this.
For the summarization tests, we can probably limit to a Top-K
approach like that used in recsys benchmarks. In this case, a top-5 would be stable.
In any case, it's probably best to support the later release, and let 3.0.x break the unit tests?