Adding MoverScore
forrestbao opened this issue · 5 comments
Like BertScore and BLEURT, MoverScore is another modern transformer-based reference-based summerization metric.
However, we did not include it in our pilot study. Now maybe a good time to add it.
Unfortunately, HF's evaluate
library does not included it. But the original author seems to have provided a good package: https://pypi.org/project/moverscore/ And the Github source is here: https://github.com/AIPHES/emnlp19-moverscore
Let's add it. Note to be fair and square (#10), let's use a RoBERTa-large based model. To select a model, see here. The model name is the model name in HuggingFace. So we can simply use RoBERTa-large (generally trained).
Is this to be included in evalbase or DocAsRef?
I think DocAsRef.
How do did you run your experiments? I think you should define your metrics, and the import EvalBase's top level functions to evaluate, do you?
I think MoverScore should be added into DocAsRef. Any metrics developed or benchmarked by us for the ACL 2023 submission should go to DocAsRef.
To evaluate, just go to EvalBase/env.py
, import metrics from DocAsRef folder, and then add the imported metrics in the metrics
dictionary. Then run EvalBase's experiment files, i.e., {summeval, realsumm, newsroom}.py
moverscore_v2
runs on CPU. moverscore
runs on GPU but unable to change model.
Refs:
https://github.com/AIPHES/emnlp19-moverscore/blob/master/moverscore.py
https://github.com/AIPHES/emnlp19-moverscore/blob/master/moverscore_v2.py
It's because MoverScore_v2 does not move variables to GPU.
Compare all lines in Moverscore that has device=device
vs. those in MoverScore_v2 without this kwarg.
Or search for cuda:0
or .to(device
in MoverScore code.
Such as the lines below in MoverScore
padded, lens, mask = padding(arr, pad_token, dtype=torch.long)
padded_idf, _, _ = padding(idf_weights, pad_token, dtype=torch.float)
padded = padded.to(device=device)
mask = mask.to(device=device)
lens = lens.to(device=device)
return padded, padded_idf, lens, mask, tokens
vs. the lines below in MoverScore_v2
padded, lens, mask = padding(arr, pad_token, dtype=torch.long)
padded_idf, _, _ = padding(idf_weights, pad_token, dtype=torch.float)
return padded, padded_idf, lens, mask, tokens
Maybe you can give them a PR.