This repository shows the implemenation of summarization models for Indian languages. The code and data available in the repo.
We use a modified fork of huggingface transformers for our experiments.
If you are using conda use the following command:
conda env create -f environment.yml
Otherwise, for creating python environment use:
pip install requirements.txt
-
We used the dataset released in the ILSUM shared task
-
Make sure to create `train, dev, test' csv files with column names "text" and "summary"
For Hindi and Gujarati
For English
To fine-tune any huggingface model you can use the run.sh
script. When running the different models described in the paper, ensure you pass the appropriate arguments.
sh run.sh
If you use our code or corpus, please kindly cite:
@article{urlana2023indian,
title={Indian language summarization using pretrained sequence-to-sequence models},
author={Urlana, Ashok and Bhatt, Sahil Manoj and Surange, Nirmal and Shrivastava, Manish},
journal={arXiv preprint arXiv:2303.14461},
year={2023}
}