/text_summarization_major

Part 1 of our Major academic project on Automatic Text Summarization

Primary LanguagePython

Text Summarization Algorithms

A Comparative Study

Developed as a part of our Semester 7 Major Project, this repository contains scripts and code to run and test the performance of popular text summarization algorithms. The algorithms studied are:

DataSet

For our experiments, the Opinosis dataset was used. It can be obtained here

@inproceedings{ganesan2010opinosis,
 title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
 author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
 booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
 pages={340--348},
 year={2010},
 organization={Association for Computational Linguistics}
}

Performance Metric

To compare the relative performance of the algorithms, a simple implementation of ROGUE-1 metric in python was used.

Replicating project results

To imitate the results of our project, one may do the following:

  1. Clone this repository and ensure that the Opinosis Dataset is present. If not, download from the link above and extract into data/.

  2. Run the run-project script.

    $ sh +x run-project.sh

    This script will clean the dataset, extract keywords, run the algorithms on the dataset, and print their respective running times and ROGUE-1 scores.

    • Individual performances of each of the algorithms can be computed by simply first running the $algorithm/$algorithm.py script, followed by running the rogue_one script with:
    $ python rogue_one.py --gold data/summaries_keywords --test $algorithm/results

Dependencies

  • python 2.7+
  • nltk
  • sumy
  • networkx