cross lingual
Closed this issue · 8 comments
Hi
can I use this metric from cross lingual evaluation?
Hi @mars203030,
No, ROUGE requires the reference and prediction text to be in the same language. For cross-lingual evaluation, you can look at this metric.
thank you soo much I will try it
but for your rouge library I have this error , another question which arabic stemmer do I need to install
`
ImportError Traceback (most recent call last)
Cell In[15], line 3
1 import sys
2 sys.path.append('/multilingual_rouge_scoring')
----> 3 from multilingual_rouge_scoring import rouge_scorer
6 scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, lang="arabic")
8 scores = scorer.get_scores(conversation_ar, ar_note)
File ~/Downloads/Visualization/NeuroNLP/attempt3/multilingual_rouge_scoring/rouge_scorer.py:37
35 from six.moves import range
36 from rouge_score import scoring
---> 37 from rouge_score import tokenization_wrapper as tokenize
38 import pyonmttok
39 import collections
ImportError: cannot import name 'tokenization_wrapper' from 'rouge_score'
`
1 import sys
2 sys.path.append('/multilingual_rouge_scoring')
----> 3 from multilingual_rouge_scoring import rouge_scorer
6 scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True, lang="arabic")
8 scores = scorer.get_scores(conversation_ar, ar_note)
This is not how you are supposed to use this library. First, install the package with the instructions given here. Then follow these examples on how to use this package with python.
another question which arabic stemmer do I need to install
This repo uses the NLTK SnowballStemmer module for Arabic. It'll be installed automatically when you install our package.
Your notebook isn't running in the same environment where you installed the package. I've replicated the correct workflow in this colab notebook. Please follow this.
Thanks , I restarted the kernal and it is working fine
I have questions regrding LASE it is working I am comparing english text(reference) and arabic text (predicted)
- I would like to know what is a good lase score what it the range
below is my result for one input
'
from LaSE import LaSEScorer
scorer = LaSEScorer()
score = scorer.score(
clinical_note,
conversation_ar,
# language name of the reference text
)
print(score)'
LaSEResult(ms=0.6220683, lc=1.0, lp=1.0, LaSE=0.6220682859420776)
2)if I define the target_lang I receive this error ValueError: predict processes one line at a time (remove '\n')
3) is there a max length my generated text is around 4000 word
- for the rouge score also is there a max length?
- there is a minimal difference in the results for the english text summarization when I use the original google package and the multilingual package . how is the difference is explained:
google : {'rouge1': Score(precision=0.37389380530973454, recall=0.49852507374631266, fmeasure=0.42730720606826805), 'rouge2': Score(precision=0.07982261640798226, recall=0.10650887573964497, fmeasure=0.09125475285171102), 'rougeL': Score(precision=0.17699115044247787, recall=0.2359882005899705, fmeasure=0.202275600505689), 'rougeLsum': Score(precision=0.32964601769911506, recall=0.4421364985163205, fmeasure=0.37769328263624846)}
MLRouge: English : {'rouge1': Score(precision=0.3893805309734513, recall=0.5191740412979351, fmeasure=0.44500632111251587), 'rouge2': Score(precision=0.08869179600886919, recall=0.11834319526627218, fmeasure=0.10139416983523449), 'rougeL': Score(precision=0.18584070796460178, recall=0.24778761061946902, fmeasure=0.21238938053097345), 'rougeLsum': Score(precision=0.33849557522123896, recall=0.4540059347181009, fmeasure=0.3878326996197719)}
Regards
I would like to know what is a good lase score what it the range
The value range for LaSE is [0, 1]. In general, we found good summaries to have LaSE score > 0.5.
if I define the target_lang I receive this error ValueError: predict processes one line at a time (remove '\n')
The target evaluation domain of this metric was short, single-line summaries. Therefore, as indicated by the error, you'd need to make sure your reference and prediction texts don't contain new lines.
is there a max length my generated text is around 4000 word
The embedding model behind LaSE, namely LaBSE, only supports sequences up to 512 tokens.
for the rouge score also is there a max length?
No.
there is a minimal difference in the results for the english text summarization when I use the original google package and the multilingual package . how is the difference is explained:
The difference is in the tokenization, stemming and character filtering policies. For example, the google package removes all non-alphanumeric characters and applies stemming when token length exceeds a threshold, which we don't do to enable multilingual evaluation. Please see both implementations to get a better idea of all the differences.
Thank you very much for your generous reply and patience.