csebuetnlp/xl-sum

cross lingual

Closed this issue · 8 comments

Hi

can I use this metric from cross lingual evaluation?

Hi @mars203030,

No, ROUGE requires the reference and prediction text to be in the same language. For cross-lingual evaluation, you can look at this metric.

thank you soo much I will try it

but for your rouge library I have this error , another question which arabic stemmer do I need to install

`
ImportError Traceback (most recent call last)
Cell In[15], line 3
1 import sys
2 sys.path.append('/multilingual_rouge_scoring')
----> 3 from multilingual_rouge_scoring import rouge_scorer
6 scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, lang="arabic")
8 scores = scorer.get_scores(conversation_ar, ar_note)

File ~/Downloads/Visualization/NeuroNLP/attempt3/multilingual_rouge_scoring/rouge_scorer.py:37
35 from six.moves import range
36 from rouge_score import scoring
---> 37 from rouge_score import tokenization_wrapper as tokenize
38 import pyonmttok
39 import collections

ImportError: cannot import name 'tokenization_wrapper' from 'rouge_score'
`

1 import sys
2 sys.path.append('/multilingual_rouge_scoring')
----> 3 from multilingual_rouge_scoring import rouge_scorer
6 scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, lang="arabic")
8 scores = scorer.get_scores(conversation_ar, ar_note)

This is not how you are supposed to use this library. First, install the package with the instructions given here. Then follow these examples on how to use this package with python.

another question which arabic stemmer do I need to install

This repo uses the NLTK SnowballStemmer module for Arabic. It'll be installed automatically when you install our package.

I am still having an issue here

here is my installation

image image

and this is the code an the error

image

Your notebook isn't running in the same environment where you installed the package. I've replicated the correct workflow in this colab notebook. Please follow this.

Thanks , I restarted the kernal and it is working fine
I have questions regrding LASE it is working I am comparing english text(reference) and arabic text (predicted)

  1. I would like to know what is a good lase score what it the range
    below is my result for one input
    '
    from LaSE import LaSEScorer
    scorer = LaSEScorer()

score = scorer.score(
clinical_note,
conversation_ar,
# language name of the reference text
)

print(score)'

LaSEResult(ms=0.6220683, lc=1.0, lp=1.0, LaSE=0.6220682859420776)

2)if I define the target_lang I receive this error ValueError: predict processes one line at a time (remove '\n')
3) is there a max length my generated text is around 4000 word

  1. for the rouge score also is there a max length?
  2. there is a minimal difference in the results for the english text summarization when I use the original google package and the multilingual package . how is the difference is explained:
    google : {'rouge1': Score(precision=0.37389380530973454, recall=0.49852507374631266, fmeasure=0.42730720606826805), 'rouge2': Score(precision=0.07982261640798226, recall=0.10650887573964497, fmeasure=0.09125475285171102), 'rougeL': Score(precision=0.17699115044247787, recall=0.2359882005899705, fmeasure=0.202275600505689), 'rougeLsum': Score(precision=0.32964601769911506, recall=0.4421364985163205, fmeasure=0.37769328263624846)}

MLRouge: English : {'rouge1': Score(precision=0.3893805309734513, recall=0.5191740412979351, fmeasure=0.44500632111251587), 'rouge2': Score(precision=0.08869179600886919, recall=0.11834319526627218, fmeasure=0.10139416983523449), 'rougeL': Score(precision=0.18584070796460178, recall=0.24778761061946902, fmeasure=0.21238938053097345), 'rougeLsum': Score(precision=0.33849557522123896, recall=0.4540059347181009, fmeasure=0.3878326996197719)}

Regards

I would like to know what is a good lase score what it the range

The value range for LaSE is [0, 1]. In general, we found good summaries to have LaSE score > 0.5.

if I define the target_lang I receive this error ValueError: predict processes one line at a time (remove '\n')

The target evaluation domain of this metric was short, single-line summaries. Therefore, as indicated by the error, you'd need to make sure your reference and prediction texts don't contain new lines.

is there a max length my generated text is around 4000 word

The embedding model behind LaSE, namely LaBSE, only supports sequences up to 512 tokens.

for the rouge score also is there a max length?

No.

there is a minimal difference in the results for the english text summarization when I use the original google package and the multilingual package . how is the difference is explained:

The difference is in the tokenization, stemming and character filtering policies. For example, the google package removes all non-alphanumeric characters and applies stemming when token length exceeds a threshold, which we don't do to enable multilingual evaluation. Please see both implementations to get a better idea of all the differences.

Thank you very much for your generous reply and patience.