/mtme-zh-mwe

The dataset of WMT22 Metrics Shared Task extended with annotations of Chinese Multiword Expressions and MQM-based translation errors associated with them.

Primary LanguagePythonApache License 2.0Apache-2.0

mtme-zh-mwe


The dataset of WMT22 Metrics Shared Task extended with annotations of 28 types of Chinese Multiword Expressions (MWEs)

πŸ“Œ9 types of non-NE MWE (MWEs#): noun-headed idiom (NID), separable word (or ionized word) (ION), four-character idiom (IDI), special construction (CON), verb-headed idiom (VID), semi non-compositional verb-particle construction (VPC.semi), light verb construction with bleached verb (LVC.full), light verb construction with causative verb (LVC.cause), and multi-verb construction (MVC)

πŸ“Œ19 types of named etities (NEs): person name (PERSON), nationality, religious, political or ethnic group (NORP), facility (FAC), organization (ORG), geopolitical entity (GPE), location (LOC), product (PRODUCT), event (EVENT), work of art (WORK_OF_ART, LAW), language (LANGUAGE), date (DATE), time (TIME), percent (PERCENT), money (MONEY), quantity (QUANTITY), ordinal (ORDINAL), cardinal (CARDINAL), terminology (TER)

πŸ—‚οΈ Description of file structure

.
β”‚  all.json
'''
Annotation results of all Chinese MWEs in the WMT22 zh-en source text (1,875 sentences) (Freitag et al., 2022),
with 'index' denoting the sentence index in the original data (starting from 0),
'property' denoting the presence of MWEs# or NEs, items in 'catogory' denoting the specific
type, position , and content of present MWEs.
'''
β”‚  cat_list.json  # List of all annotated Chinese MWE items grouped by categories (non-deduplicated).
β”‚  mtmeMWE.py  # Script of three statistical methods at both property and category level. Examples are provided in the file.
β”‚  
β”œβ”€#dic  # Dictionary exploited for the annotation of MWE#.
β”‚      dic_CDI_35458.txt  # A Comprehensive Dictionary of Chinese/English Idioms (Zhang and Zhang, 2014)
β”‚      dic_ID10M_dev_372.txt 
β”‚      dic_ID10M_test_80.txt
β”‚      dic_ID10M_train_1835.txt  # ID10M dataset (Tedeschi et al., 2022)
β”‚      dic_PARSEME_4827.json
β”‚      dic_PARSEME_4827.txt  # PARSEME 1.3 (zh) (Savary et al., 2023)
β”‚      dic_total_39978.json
β”‚      dic_total_39978.txt
β”‚      
β”œβ”€category level  # Annotation results grouped by MWE categories.
β”‚      cat.json
β”‚      
β”œβ”€property level  # Annotation results grouped by MWE# and NE presence.
β”‚      mwe#+ne.json
β”‚      only_mwe#.json
β”‚      only_ne.json
β”‚      with.json
β”‚      without.json
β”‚      without_mwe#+ne.json
β”‚      without_mwe#.json
β”‚      without_ne.json
β”‚      without_only_mwe#.json
β”‚      without_only_ne.json
β”‚      with_mwe#.json
β”‚      with_ne.json
β”‚      
└─raw_wmt22  # Original dataset from WMT22 Metric Sharted Task.
    β”œβ”€documents
    β”‚      zh-en.docs
    β”‚      
    β”œβ”€human-scores
    β”‚      zh-en.mqm.seg.score
    β”‚      zh-en.wmt-appraise-z.seg.score
    β”‚      zh-en.wmt-appraise.seg.score
    β”‚      zh-en.wmt-z.seg.score
    β”‚      zh-en.wmt.seg.score
    β”‚      
    β”œβ”€human-score_sd
    β”‚      zh-en.mqm.domain.score
    β”‚      zh-en.mqm.sys.score
    β”‚      zh-en.wmt-appraise-z.sys.score
    β”‚      zh-en.wmt-appraise.domain.score
    β”‚      zh-en.wmt-appraise.sys.score
    β”‚      zh-en.wmt-z.domain.score
    β”‚      zh-en.wmt-z.sys.score
    β”‚      zh-en.wmt.domain.score
    β”‚      zh-en.wmt.sys.score
    β”‚      
    β”œβ”€human-score_sd-conversation
    β”‚      zh-en.mqm.domain.score
    β”‚      zh-en.mqm.sys.score
    β”‚      zh-en.wmt-appraise.sys.score
    β”‚      zh-en.wmt-z.domain.score
    β”‚      zh-en.wmt-z.sys.score
    β”‚      zh-en.wmt.domain.score
    β”‚      zh-en.wmt.sys.score
    β”‚      
    β”œβ”€human-score_sd-ecommerce
    β”‚      zh-en.mqm.domain.score
    β”‚      zh-en.mqm.sys.score
    β”‚      zh-en.wmt-appraise-z.sys.score
    β”‚      zh-en.wmt-appraise.domain.score
    β”‚      zh-en.wmt-appraise.sys.score
    β”‚      zh-en.wmt-z.domain.score
    β”‚      zh-en.wmt-z.sys.score
    β”‚      zh-en.wmt.domain.score
    β”‚      zh-en.wmt.sys.score
    β”‚      
    β”œβ”€human-score_sd-news
    β”‚      zh-en.mqm.domain.score
    β”‚      zh-en.mqm.sys.score
    β”‚      zh-en.wmt-appraise-z.sys.score
    β”‚      zh-en.wmt-appraise.domain.score
    β”‚      zh-en.wmt-appraise.sys.score
    β”‚      zh-en.wmt-z.domain.score
    β”‚      zh-en.wmt-z.sys.score
    β”‚      zh-en.wmt.domain.score
    β”‚      zh-en.wmt.sys.score
    β”‚      
    β”œβ”€human-score_sd-social
    β”‚      zh-en.mqm.domain.score
    β”‚      zh-en.mqm.sys.score
    β”‚      zh-en.wmt-appraise-z.sys.score
    β”‚      zh-en.wmt-appraise.domain.score
    β”‚      zh-en.wmt-appraise.sys.score
    β”‚      zh-en.wmt-z.domain.score
    β”‚      zh-en.wmt-z.sys.score
    β”‚      zh-en.wmt.domain.score
    β”‚      zh-en.wmt.sys.score
    β”‚      
    β”œβ”€metric-scores
    β”‚  β”œβ”€zh-en_all
    β”‚  β”‚      BERTScore-refA.seg.score
    β”‚  β”‚      BERTScore-refB.seg.score
    β”‚  β”‚      BLEU-refA.seg.score
    β”‚  β”‚      BLEU-refB.seg.score
    β”‚  β”‚      BLEURT-20-refA.seg.score
    β”‚  β”‚      BLEURT-20-refB.seg.score
    β”‚  β”‚      chrF-refA.seg.score
    β”‚  β”‚      chrF-refB.seg.score
    β”‚  β”‚      COMET-20-refA.seg.score
    β”‚  β”‚      COMET-20-refB.seg.score
    β”‚  β”‚      COMET-22-refA.seg.score
    β”‚  β”‚      COMET-22-refB.seg.score
    β”‚  β”‚      COMET-QE-src.seg.score
    β”‚  β”‚      COMETKiwi-src.seg.score
    β”‚  β”‚      Cross-QE-src.seg.score
    β”‚  β”‚      f101spBLEU-refA.seg.score
    β”‚  β”‚      f101spBLEU-refB.seg.score
    β”‚  β”‚      f200spBLEU-refA.seg.score
    β”‚  β”‚      f200spBLEU-refB.seg.score
    β”‚  β”‚      HWTSC-Teacher-Sim-src.seg.score
    β”‚  β”‚      HWTSC-TLM-src.seg.score
    β”‚  β”‚      KG-BERTScore-src.seg.score
    β”‚  β”‚      MATESE-QE-src.seg.score
    β”‚  β”‚      MATESE-refA.seg.score
    β”‚  β”‚      MATESE-refB.seg.score
    β”‚  β”‚      MEE-refA.seg.score
    β”‚  β”‚      MEE-refB.seg.score
    β”‚  β”‚      MEE2-refA.seg.score
    β”‚  β”‚      MEE2-refB.seg.score
    β”‚  β”‚      MEE4-refA.seg.score
    β”‚  β”‚      MEE4-refB.seg.score
    β”‚  β”‚      metricx_xl_DA_2019-refA.seg.score
    β”‚  β”‚      metricx_xl_DA_2019-refB.seg.score
    β”‚  β”‚      MS-COMET-22-refA.seg.score
    β”‚  β”‚      MS-COMET-22-refB.seg.score
    β”‚  β”‚      MS-COMET-QE-22-src.seg.score
    β”‚  β”‚      REUSE-src.seg.score
    β”‚  β”‚      SEScore-refA.seg.score
    β”‚  β”‚      SEScore-refB.seg.score
    β”‚  β”‚      UniTE-ref-refA.seg.score
    β”‚  β”‚      UniTE-ref-refB.seg.score
    β”‚  β”‚      UniTE-refA.seg.score
    β”‚  β”‚      UniTE-refB.seg.score
    β”‚  β”‚      UniTE-src-src.seg.score
    β”‚  β”‚      YiSi-1-refA.seg.score
    β”‚  β”‚      YiSi-1-refB.seg.score
    β”‚  β”‚      
    β”‚  β”œβ”€zh-en_qe
    β”‚  β”‚      COMET-QE-src.seg.score
    β”‚  β”‚      COMETKiwi-src.seg.score
    β”‚  β”‚      Cross-QE-src.seg.score
    β”‚  β”‚      HWTSC-Teacher-Sim-src.seg.score
    β”‚  β”‚      HWTSC-TLM-src.seg.score
    β”‚  β”‚      KG-BERTScore-src.seg.score
    β”‚  β”‚      MATESE-QE-src.seg.score
    β”‚  β”‚      MS-COMET-QE-22-src.seg.score
    β”‚  β”‚      REUSE-src.seg.score
    β”‚  β”‚      UniTE-src-src.seg.score
    β”‚  β”‚      
    β”‚  └─zh-en_ref
    β”‚          BERTScore-refA.seg.score
    β”‚          BERTScore-refB.seg.score
    β”‚          BLEU-refA.seg.score
    β”‚          BLEU-refB.seg.score
    β”‚          BLEURT-20-refA.seg.score
    β”‚          BLEURT-20-refB.seg.score
    β”‚          chrF-refA.seg.score
    β”‚          chrF-refB.seg.score
    β”‚          COMET-20-refA.seg.score
    β”‚          COMET-20-refB.seg.score
    β”‚          COMET-22-refA.seg.score
    β”‚          COMET-22-refB.seg.score
    β”‚          f101spBLEU-refA.seg.score
    β”‚          f101spBLEU-refB.seg.score
    β”‚          f200spBLEU-refA.seg.score
    β”‚          f200spBLEU-refB.seg.score
    β”‚          MATESE-refA.seg.score
    β”‚          MATESE-refB.seg.score
    β”‚          MEE-refA.seg.score
    β”‚          MEE-refB.seg.score
    β”‚          MEE2-refA.seg.score
    β”‚          MEE2-refB.seg.score
    β”‚          MEE4-refA.seg.score
    β”‚          MEE4-refB.seg.score
    β”‚          metricx_xl_DA_2019-refA.seg.score
    β”‚          metricx_xl_DA_2019-refB.seg.score
    β”‚          MS-COMET-22-refA.seg.score
    β”‚          MS-COMET-22-refB.seg.score
    β”‚          SEScore-refA.seg.score
    β”‚          SEScore-refB.seg.score
    β”‚          UniTE-ref-refA.seg.score
    β”‚          UniTE-ref-refB.seg.score
    β”‚          UniTE-refA.seg.score
    β”‚          UniTE-refB.seg.score
    β”‚          YiSi-1-refA.seg.score
    β”‚          YiSi-1-refB.seg.score
    β”‚          
    β”œβ”€references
    β”‚      zh-en.refA.txt
    β”‚      zh-en.refB.txt
    β”‚      
    β”œβ”€sources
    β”‚      zh-en.txt
    β”‚      
    └─system-outputs
        └─zh-en
                AISP-SJTU.txt
                bleurt_bestmbr.txt
                bleu_bestmbr.txt
                chrf_bestmbr.txt
                comet_bestmbr.txt
                DLUT.txt
                HuaweiTSC.txt
                JDExploreAcademy.txt
                Lan-Bridge.txt
                LanguageX.txt
                M2M100_1.2B-B4.txt
                NiuTrans.txt
                Online-A.txt
                Online-B.txt
                Online-G.txt
                Online-W.txt
                Online-Y.txt
                QUARTZ_TuneReranking.txt
                refA.txt
                refB.txt

References

Markus Freitag, Ricardo Rei, Nitika Mathur, Chikiu Lo, Craig Stewart, Eleftherios Avramidis, Tom Kocmi, George Foster, Alon Lavie, and AndrΓ© F. T. Martins. 2022. Results of WMT22 metrics shared task: Stop using BLEU - neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 46–68, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.

Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga GΓΌngΓΆr, Thomas Pickard, Bruno Guillaume, Eduard Bejček, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa IΓ±urrieta, Albert Gatt, Jolanta Kovalevskaite, Timm Lichte, Nikola LjubeΕ‘iΔ‡, Johanna Monti, Carla Parra EscartΓ­n, Mehrnoush Shamsfard, Ivelina Stoyanova, Veronika Vincze, and Abigail Walsh. 2023. PARSEME corpus release 1.3. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 24–35, Dubrovnik, Croatia. Association for Computational Linguistics.

Simone Tedeschi, Federico Martelli, and Roberto Navigli. 2022. ID10M: Idiom identification in 10 languages. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2715–2726, Seattle, United States. Association for Computational Linguistics.

Xueying Zhang and Hui Zhang. 2014. A Comprehensive Dictionary of Chinese/English Idioms with English/Chinese Translations. Tsinghua University Press, Beijing.