/gazeta

Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке

Primary LanguagePython

Gazeta dataset

Paper: Dataset for Automatic Summarization of Russian News

Download

Dropbox:

UPDATE:

Other sources:

Trained MBART model:

https://huggingface.co/IlyaGusev/mbart_ru_sum_gazeta

Additional notes

  • Legal basis for distribution of the dataset:
  • Cleaning: Open In Colab
  • Data analysis: Open In Colab
  • Summarization methods: Open In Colab
  • Other Russian summarization datasets:

Contacts

Citation

@InProceedings{Gusev2020gazeta,
    author="Gusev, Ilya",
    title="Dataset for Automatic Summarization of Russian News",
    booktitle="Artificial Intelligence and Natural Language",
    year="2020",
    publisher="Springer International Publishing",
    address="Cham",
    pages="{122--134}",
    isbn="978-3-030-59082-6",
    doi={10.1007/978-3-030-59082-6_9}
}