/Indian-Language-Summarization

This repository shows the implemenation of summarization models for Indian languages. The code and data available in the repo.

Primary LanguagePython

Indian-Language-Summarization

This repository shows the implemenation of summarization models for Indian languages. The code and data available in the repo.

We use a modified fork of huggingface transformers for our experiments.

Creating environment

If you are using conda use the following command:

conda env create -f environment.yml

Otherwise, for creating python environment use:

pip install requirements.txt

Data format:

  • We used the dataset released in the ILSUM shared task

  • Make sure to create `train, dev, test' csv files with column names "text" and "summary"

Models:

For Hindi and Gujarati

For English

Run the script

To fine-tune any huggingface model you can use the run.sh script. When running the different models described in the paper, ensure you pass the appropriate arguments.

sh run.sh

Reference

If you use our code or corpus, please kindly cite:

@article{urlana2023indian,
  title={Indian language summarization using pretrained sequence-to-sequence models},
  author={Urlana, Ashok and Bhatt, Sahil Manoj and Surange, Nirmal and Shrivastava, Manish},
  journal={arXiv preprint arXiv:2303.14461},
  year={2023}
}