/PMIndiaSum

This repository includes the PMIndiaSum corpus and scripts for baselines models

Primary LanguagePython

PMIndiaSum

Overview

This repository includes the PMIndiaSum dataset under data/ and scripts for monolingual, cross-lingual, and multilingual baseline models under baselines/.

License

Our materials are released under the CC-BY-4.0, in other words, these can be freely shared and adapted as long as appropriate credit is given. Full license: https://creativecommons.org/licenses/by/4.0/. The data is originally derived from the PM India website which has their license at https://www.pmindia.gov.in/en/website-policies/.

Reference

Our work is published as an EMNLP 2023 Findings paper. If you use our code or corpus, please kindly cite:

@inproceedings{urlana-etal-2023-pmindiasum,
    title = "{PMI}ndia{S}um: Multilingual and Cross-lingual Headline Summarization for Languages in {I}ndia",
    author = "Urlana, Ashok  and
      Chen, Pinzhen  and
      Zhao, Zheng  and
      Cohen, Shay  and
      Shrivastava, Manish  and
      Haddow, Barry",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.777",
    doi = "10.18653/v1/2023.findings-emnlp.777",
    pages = "11606--11628",
}