The dataset of WMT22 Metrics Shared Task extended with annotations of 28 types of Chinese Multiword Expressions (MWEs)
π9 types of non-NE MWE (MWEs#): noun-headed idiom (NID), separable word (or ionized word) (ION), four-character idiom (IDI), special construction (CON), verb-headed idiom (VID), semi non-compositional verb-particle construction (VPC.semi), light verb construction with bleached verb (LVC.full), light verb construction with causative verb (LVC.cause), and multi-verb construction (MVC)
π19 types of named etities (NEs): person name (PERSON), nationality, religious, political or ethnic group (NORP), facility (FAC), organization (ORG), geopolitical entity (GPE), location (LOC), product (PRODUCT), event (EVENT), work of art (WORK_OF_ART, LAW), language (LANGUAGE), date (DATE), time (TIME), percent (PERCENT), money (MONEY), quantity (QUANTITY), ordinal (ORDINAL), cardinal (CARDINAL), terminology (TER)
.
β all.json
'''
Annotation results of all Chinese MWEs in the WMT22 zh-en source text (1,875 sentences) (Freitag et al., 2022),
with 'index' denoting the sentence index in the original data (starting from 0),
'property' denoting the presence of MWEs# or NEs, items in 'catogory' denoting the specific
type, position , and content of present MWEs.
'''
β cat_list.json # List of all annotated Chinese MWE items grouped by categories (non-deduplicated).
β mtmeMWE.py # Script of three statistical methods at both property and category level. Examples are provided in the file.
β
ββ#dic # Dictionary exploited for the annotation of MWE#.
β dic_CDI_35458.txt # A Comprehensive Dictionary of Chinese/English Idioms (Zhang and Zhang, 2014)
β dic_ID10M_dev_372.txt
β dic_ID10M_test_80.txt
β dic_ID10M_train_1835.txt # ID10M dataset (Tedeschi et al., 2022)
β dic_PARSEME_4827.json
β dic_PARSEME_4827.txt # PARSEME 1.3 (zh) (Savary et al., 2023)
β dic_total_39978.json
β dic_total_39978.txt
β
ββcategory level # Annotation results grouped by MWE categories.
β cat.json
β
ββproperty level # Annotation results grouped by MWE# and NE presence.
β mwe#+ne.json
β only_mwe#.json
β only_ne.json
β with.json
β without.json
β without_mwe#+ne.json
β without_mwe#.json
β without_ne.json
β without_only_mwe#.json
β without_only_ne.json
β with_mwe#.json
β with_ne.json
β
ββraw_wmt22 # Original dataset from WMT22 Metric Sharted Task.
ββdocuments
β zh-en.docs
β
ββhuman-scores
β zh-en.mqm.seg.score
β zh-en.wmt-appraise-z.seg.score
β zh-en.wmt-appraise.seg.score
β zh-en.wmt-z.seg.score
β zh-en.wmt.seg.score
β
ββhuman-score_sd
β zh-en.mqm.domain.score
β zh-en.mqm.sys.score
β zh-en.wmt-appraise-z.sys.score
β zh-en.wmt-appraise.domain.score
β zh-en.wmt-appraise.sys.score
β zh-en.wmt-z.domain.score
β zh-en.wmt-z.sys.score
β zh-en.wmt.domain.score
β zh-en.wmt.sys.score
β
ββhuman-score_sd-conversation
β zh-en.mqm.domain.score
β zh-en.mqm.sys.score
β zh-en.wmt-appraise.sys.score
β zh-en.wmt-z.domain.score
β zh-en.wmt-z.sys.score
β zh-en.wmt.domain.score
β zh-en.wmt.sys.score
β
ββhuman-score_sd-ecommerce
β zh-en.mqm.domain.score
β zh-en.mqm.sys.score
β zh-en.wmt-appraise-z.sys.score
β zh-en.wmt-appraise.domain.score
β zh-en.wmt-appraise.sys.score
β zh-en.wmt-z.domain.score
β zh-en.wmt-z.sys.score
β zh-en.wmt.domain.score
β zh-en.wmt.sys.score
β
ββhuman-score_sd-news
β zh-en.mqm.domain.score
β zh-en.mqm.sys.score
β zh-en.wmt-appraise-z.sys.score
β zh-en.wmt-appraise.domain.score
β zh-en.wmt-appraise.sys.score
β zh-en.wmt-z.domain.score
β zh-en.wmt-z.sys.score
β zh-en.wmt.domain.score
β zh-en.wmt.sys.score
β
ββhuman-score_sd-social
β zh-en.mqm.domain.score
β zh-en.mqm.sys.score
β zh-en.wmt-appraise-z.sys.score
β zh-en.wmt-appraise.domain.score
β zh-en.wmt-appraise.sys.score
β zh-en.wmt-z.domain.score
β zh-en.wmt-z.sys.score
β zh-en.wmt.domain.score
β zh-en.wmt.sys.score
β
ββmetric-scores
β ββzh-en_all
β β BERTScore-refA.seg.score
β β BERTScore-refB.seg.score
β β BLEU-refA.seg.score
β β BLEU-refB.seg.score
β β BLEURT-20-refA.seg.score
β β BLEURT-20-refB.seg.score
β β chrF-refA.seg.score
β β chrF-refB.seg.score
β β COMET-20-refA.seg.score
β β COMET-20-refB.seg.score
β β COMET-22-refA.seg.score
β β COMET-22-refB.seg.score
β β COMET-QE-src.seg.score
β β COMETKiwi-src.seg.score
β β Cross-QE-src.seg.score
β β f101spBLEU-refA.seg.score
β β f101spBLEU-refB.seg.score
β β f200spBLEU-refA.seg.score
β β f200spBLEU-refB.seg.score
β β HWTSC-Teacher-Sim-src.seg.score
β β HWTSC-TLM-src.seg.score
β β KG-BERTScore-src.seg.score
β β MATESE-QE-src.seg.score
β β MATESE-refA.seg.score
β β MATESE-refB.seg.score
β β MEE-refA.seg.score
β β MEE-refB.seg.score
β β MEE2-refA.seg.score
β β MEE2-refB.seg.score
β β MEE4-refA.seg.score
β β MEE4-refB.seg.score
β β metricx_xl_DA_2019-refA.seg.score
β β metricx_xl_DA_2019-refB.seg.score
β β MS-COMET-22-refA.seg.score
β β MS-COMET-22-refB.seg.score
β β MS-COMET-QE-22-src.seg.score
β β REUSE-src.seg.score
β β SEScore-refA.seg.score
β β SEScore-refB.seg.score
β β UniTE-ref-refA.seg.score
β β UniTE-ref-refB.seg.score
β β UniTE-refA.seg.score
β β UniTE-refB.seg.score
β β UniTE-src-src.seg.score
β β YiSi-1-refA.seg.score
β β YiSi-1-refB.seg.score
β β
β ββzh-en_qe
β β COMET-QE-src.seg.score
β β COMETKiwi-src.seg.score
β β Cross-QE-src.seg.score
β β HWTSC-Teacher-Sim-src.seg.score
β β HWTSC-TLM-src.seg.score
β β KG-BERTScore-src.seg.score
β β MATESE-QE-src.seg.score
β β MS-COMET-QE-22-src.seg.score
β β REUSE-src.seg.score
β β UniTE-src-src.seg.score
β β
β ββzh-en_ref
β BERTScore-refA.seg.score
β BERTScore-refB.seg.score
β BLEU-refA.seg.score
β BLEU-refB.seg.score
β BLEURT-20-refA.seg.score
β BLEURT-20-refB.seg.score
β chrF-refA.seg.score
β chrF-refB.seg.score
β COMET-20-refA.seg.score
β COMET-20-refB.seg.score
β COMET-22-refA.seg.score
β COMET-22-refB.seg.score
β f101spBLEU-refA.seg.score
β f101spBLEU-refB.seg.score
β f200spBLEU-refA.seg.score
β f200spBLEU-refB.seg.score
β MATESE-refA.seg.score
β MATESE-refB.seg.score
β MEE-refA.seg.score
β MEE-refB.seg.score
β MEE2-refA.seg.score
β MEE2-refB.seg.score
β MEE4-refA.seg.score
β MEE4-refB.seg.score
β metricx_xl_DA_2019-refA.seg.score
β metricx_xl_DA_2019-refB.seg.score
β MS-COMET-22-refA.seg.score
β MS-COMET-22-refB.seg.score
β SEScore-refA.seg.score
β SEScore-refB.seg.score
β UniTE-ref-refA.seg.score
β UniTE-ref-refB.seg.score
β UniTE-refA.seg.score
β UniTE-refB.seg.score
β YiSi-1-refA.seg.score
β YiSi-1-refB.seg.score
β
ββreferences
β zh-en.refA.txt
β zh-en.refB.txt
β
ββsources
β zh-en.txt
β
ββsystem-outputs
ββzh-en
AISP-SJTU.txt
bleurt_bestmbr.txt
bleu_bestmbr.txt
chrf_bestmbr.txt
comet_bestmbr.txt
DLUT.txt
HuaweiTSC.txt
JDExploreAcademy.txt
Lan-Bridge.txt
LanguageX.txt
M2M100_1.2B-B4.txt
NiuTrans.txt
Online-A.txt
Online-B.txt
Online-G.txt
Online-W.txt
Online-Y.txt
QUARTZ_TuneReranking.txt
refA.txt
refB.txt
Markus Freitag, Ricardo Rei, Nitika Mathur, Chikiu Lo, Craig Stewart, Eleftherios Avramidis, Tom Kocmi, George Foster, Alon Lavie, and AndrΓ© F. T. Martins. 2022. Results of WMT22 metrics shared task: Stop using BLEU - neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 46β68, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga GΓΌngΓΆr, Thomas Pickard, Bruno Guillaume, Eduard BejΔek, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa IΓ±urrieta, Albert Gatt, Jolanta Kovalevskaite, Timm Lichte, Nikola LjubeΕ‘iΔ, Johanna Monti, Carla Parra EscartΓn, Mehrnoush Shamsfard, Ivelina Stoyanova, Veronika Vincze, and Abigail Walsh. 2023. PARSEME corpus release 1.3. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 24β35, Dubrovnik, Croatia. Association for Computational Linguistics.
Simone Tedeschi, Federico Martelli, and Roberto Navigli. 2022. ID10M: Idiom identification in 10 languages. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2715β2726, Seattle, United States. Association for Computational Linguistics.
Xueying Zhang and Hui Zhang. 2014. A Comprehensive Dictionary of Chinese/English Idioms with English/Chinese Translations. Tsinghua University Press, Beijing.