danieldeutsch/sacrerouge

PyrEval

ZhangShiyue opened this issue · 5 comments

Hi, thanks for this awesome toolkit!

I encountered an error with PyrEval. I tried to run:

summary = "Dundee United Striker Nadir CIFTCI celebrated a goal by blowing a kiss at opposition goalkeeper Scott Bain . The 23 - year - old celebrated by trying to Rile Dundee No 1 Bain , but his actions came back to haunt him as the Dark Blues earned all three points thanks to further goals from Jake McPake and Paul Heffernan . Dundee's first win in a derby for more than 10 years ."

ref = "nadir ciftci celebrated by blowing a kiss at rival goalkeeper scott bain . however , ciftci was left blushing as rivals earned impressive victory . win gave hosts dundee their first derby win in more than a decade . goals from greg stewart , jake mcpake and paul heffernen secured win ."

pyreval.score(summary, [ref])

Here is the verbose log:

../Preprocess/peer_summaries
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.5 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 6.997 (s)
[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [8.2 sec].

Processing file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/.gitkeep ... writing to /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/peer_summaries/.gitkeep.xml
Annotating file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/.gitkeep ... done [0.1 sec].
Processing file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/0 ... writing to /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/peer_summaries/0.xml
Annotating file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/0 ... done [0.8 sec].
Processing file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/1 ... writing to /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/peer_summaries/1.xml
Annotating file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/1 ... done [0.2 sec].

Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.1 sec.
ParserAnnotator: 0.8 sec.
DependencyParseAnnotator: 0.1 sec.
TOTAL: 1.0 sec. for 120 tokens at 114.9 tokens/sec.
Pipeline setup: 9.4 sec.
Total time for StanfordCoreNLP pipeline: 10.6 sec.
DECOMPOSING SENTENCES FROM SUMMARY ../Preprocess/peer_summaries/0.xml
VECTORIZING SEGMENTS FROM SUMMARY ../Preprocess/peer_summaries/0.xml
/ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/ormf/ormf.py:112: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
segment.setVector(np.linalg.lstsq(num, den)[0])
DECOMPOSING SENTENCES FROM SUMMARY ../Preprocess/peer_summaries/1.xml
VECTORIZING SEGMENTS FROM SUMMARY ../Preprocess/peer_summaries/1.xml
Time: 1.70384001732

Welcome to the PyrEval Launcher.

NOTES:

  • All model summary files should be in ./Raw/model/
  • All peer summary files should be in ./Raw/peers/
  • The Stanford Core NLP Tools package should be in ./Stanford/

0: Automatic mode (not recommended)

1: Preprocess - Split sentences
2: Run Stanford Core NLP Tools
3: Preprocess - Main
4: Build pyramids
5: Score pyramids

c: Clean directories
i: Change python interpreter

To quit, type nothing and press return.

['../Preprocess/wise_crowd_summaries/0.xml', '../Preprocess/wise_crowd_summaries/1']
4
4
4
Traceback (most recent call last):
File "/ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Pyramid/pyramid.py", line 94, in
BigSet2 = pairwise(segmentpool, N, threshold)
File "/ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Pyramid/lib_pyramid.py", line 341, in pairwise
Q3 = np.percentile(np.asarray(scores), threshold)
File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3707, in percentile
a, q, axis, out, overwrite_input, interpolation, keepdims)
File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3826, in _quantile_unchecked
interpolation=interpolation)
File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3405, in _ureduce
r = func(a, **kwargs)
File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3941, in _quantile_ureduce_func
x1 = take(ap, indices_below, axis=axis) * weights_below
File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 189, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

I did run pytest sacrerouge/tests/metrics/pyreval_test.py, which looks normal to me:
E AssertionError: Instance 1 not equal. Expected {'pyreval': {'raw': 16, 'quality': 0.47058823529411764, 'coverage': 0.3404255319148936, 'comprehensive': 0.4055068836045056}}, actual {'pyreval': {'raw': 17, 'quality': 0.5, 'coverage': 0.3617021276595745, 'comprehensive': 0.4308510638297872}}

sacrerouge/common/testing/metric_test_cases.py:42: AssertionError
=============================== short test summary info ================================
FAILED sacrerouge/tests/metrics/pyreval_test.py::TestPyrEval::test_pyreval - Assertio...
======================= 1 failed, 3 passed in 377.74s (0:06:17) ========================

So, I don't know why it always has this error when evaluating my examples. I also tried some other [summary, ref] pairs, however, all throw this error. Do you have any idea of why this happens? Any hint will be helpful! Thank you so much!

Hi,

I think this is a bug in the original PyrEval code in which the code crashes if there is only 1 reference summary. I've had this problem before too.

When the pyramid is constructed, there is a pairwise similarity set created here
https://github.com/serenayj/PyrEval/blob/b44bb991cf82c30e473b02534e1dbc2687747091/Pyramid/pyramid.py#L84-L94

That calls the pairwise function, which gets all combinations of the segments across reference summaries via the combinations function
https://github.com/serenayj/PyrEval/blob/b44bb991cf82c30e473b02534e1dbc2687747091/Pyramid/lib_pyramid.py#L306-L312
summs is length 1 because there's only 1 reference, so getting all the pairwise combinations results in an empty summ_pairs, causing scores to remain empty.

I think if you do need to run it with 1 reference summary, it's probably best to open an issue here.

It's also worth knowing that if you do have multiple references, the score depends on the order of the references, which ends up being platform dependent. See here and my note about it here.

Hi, thanks for your prompt reply!

I tried to repeat the reference: pyreval.score(summary, [ref, ref])
There is no error. Do you think this is a valid solution or not?

I am not sure, sorry. You would have to ask the authors of PyrEval. I only wrote a wrapper around their implementation

No worries. Sure, thank you so much!

If it gets fixed upstream, I am happy to merge the changes here. Thanks!