Yale-LILY/SummEval

Value Error in Bert_Score_Metric

Closed this issue · 2 comments

I was trying to use the BertScoreMetric object and got an error when using evaluate_example. I modified the bert_score_metric.py file on line 50-51 to fix it:

score = [{"bert_score_precision": p.cpu().item(), "bert_score_recall": r.cpu().item(), "bert_score_f1":
f.cpu().item()} for (p, r, f) in all_preds]

into

score = {"bert_score_precision": all_preds[0].cpu().item(), "bert_score_recall": all_preds[1].cpu().item(), "bert_score_f1":
all_preds[2].cpu().item()}

The change is minor, but if you can think of any reason why it did not work on my end before, please let me know.

Console Output:
Traceback (most recent call last):
File "/home/stephen/git/project/src/metrics/test1.py", line 9, in
score = metric.evaluate_example(hyp, ref)
File "/home/stephen/anaconda3/envs/summary_evals/lib/python3.9/site-packages/summ_eval/bert_score_metric.py", line 50, in evaluate_example
score = [{"bert_score_precision": p.cpu().item(), "bert_score_recall": r.cpu().item(), "bert_score_f1":
File "/home/stephen/anaconda3/envs/summary_evals/lib/python3.9/site-packages/summ_eval/bert_score_metric.py", line 51, in
f.cpu().item()} for (p, r, f) in all_preds]
ValueError: not enough values to unpack (expected 3, got 1)

Python: 3.9.5
Installed Libraries:
bert-score 0.3.9
blanc 0.2.1
blis 0.7.4
boto3 1.18.9
botocore 1.21.9
catalogue 2.0.4
certifi 2021.5.30
charset-normalizer 2.0.3
click 7.1.2
cycler 0.10.0
cymem 2.0.5
filelock 3.0.12
gin-config 0.4.0
huggingface-hub 0.0.12
idna 3.2
Jinja2 3.0.1
jmespath 0.10.0
joblib 1.0.1
kiwisolver 1.3.1
MarkupSafe 2.0.1
matplotlib 3.4.2
moverscore 1.0.3
murmurhash 1.0.5
networkx 2.6.2
nltk 3.6.2
numpy 1.21.1
packaging 21.0
pandas 1.3.1
pathy 0.6.0
Pillow 8.3.1
pip 21.1.3
portalocker 2.0.0
preshed 3.0.5
protobuf 3.17.3
psutil 5.8.0
pydantic 1.8.2
pyemd 0.5.1
pyparsing 2.4.7
pyrouge 0.1.3
python-dateutil 2.8.2
pytorch-pretrained-bert 0.6.2
pytz 2021.1
PyYAML 5.4.1
regex 2021.7.6
requests 2.26.0
s3transfer 0.5.0
sacrebleu 1.5.1
sacremoses 0.0.45
scikit-learn 0.24.2
scipy 1.7.0
setuptools 52.0.0.post20210125
six 1.16.0
sklearn 0.0
smart-open 5.1.0
spacy 3.1.1
spacy-legacy 3.0.8
srsly 2.4.1
stanza 1.2.2
summ-eval 0.70
thinc 8.0.8
threadpoolctl 2.2.0
tokenizers 0.10.3
torch 1.9.0
tqdm 4.61.2
transformers 4.9.1
typer 0.3.2
typing 3.7.4.3
typing-extensions 3.10.0.0
urllib3 1.26.6
wasabi 0.8.2
wheel 0.36.2
wmd 1.3.2

Thanks for pointing that out! Updated.

still happend...

导入需要的库

from summ_eval.rouge_metric import RougeMetric
from summ_eval.bleu_metric import BleuMetric
from summ_eval.meteor_metric import MeteorMetric
from summ_eval.bert_score_metric import BertScoreMetric

print(1)

创建你想要使用的评估指标的对象

rouge = RougeMetric()
bleu = BleuMetric()
meteor = MeteorMetric()
bertscore = BertScoreMetric()

定义你的系统生成的摘要和参考摘要

summary = "The author focuses on the problem of group learning of College students. Based on the information of college studentsin Jiangsu Province, the author constructs the individual portrait of college students in three aspects: learning ability, learningbehavior and learning achievement. The author uses K-means algorithm to construct the group portrait of College students, andproposes to strengthen the vocational planning education of liberal arts majors and the category of mathematical physics. Teachingsuggestions. The experimental results show that the group of editors can be divided into four sub-categories. The main reasons forgrading are poor attitudes towards learning (academic) and learning difficulties in mathematics and physics courses. The main reasonfor grading liberal arts students is poor attitudes towards learning (academic)." # 这应该是你的系统生成的摘要
reference = "This study focuses on the issue of group learning among college students. The author utilized data from college students in Jiangsu Province to create an individual profile of each student based on three aspects: learning ability, learning behavior, and learning achievement. Using the K-means algorithm, the author then constructed a group profile of college students and suggested specific teaching recommendations to enhance vocational planning education for liberal arts majors and mathematical physics categories. The experiment results reveal that the college student group can be segmented into four subcategories, with poor learning attitudes and struggles with mathematics and physics courses being the main reasons for lower grades. Specifically, students in liberal arts tend to have poorer attitudes towards learning (academic)." # 这应该是你的参考摘要

print(2)

评估每个指标

rouge_score = rouge.evaluate_example(summary, reference)
bleu_score = bleu.evaluate_example(summary, reference)

meteor_score = meteor.evaluate_example(summary, reference)

bertscore_score = bertscore.evaluate_example(summary, reference)

try:
bertscore_score = bertscore.evaluate_example(summary, reference)
print('BertScore:', bertscore_score)
except ValueError as e:
print('BertScore evaluation failed:', str(e))

print(rouge_score)
print(bleu_score)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']

  • This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    hash_code: bert-base-uncased_L8_no-idf_version=0.3.12(hug_trans=4.18.0)
    BertScore evaluation failed: not enough values to unpack (expected 3, got 1)