Towards Coherent Single-Document Automatic Text Summarization: An Integer Linear Programming-based Approach
Automatic text summarization tool developed as Bachelor Thesis in Computer Science at the Federal Rural University of Pernambuco (UFRPE).
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Java JDK - The software development environment used for building the application
- Gurobi - The Linear Programming (LP) optimization solver used in the project
- ROUGE - The software package used for evaluating the generated automatic text summarization
First you need to install and have a license of the Gurobi software on your machine. You can refer to the documentation at:
https://www.gurobi.com/documentation/8.0/quickstart_mac/obtaining_a_gurobi_license.html
Clone this repository on your local machine.
$ git clone https://github.com/CarlosRodrigo/text-summarizer-ilp.git
Once you have the repository on your local machine you can open the project with your favorite IDE.
All the project dependencies are inside the project folder located in:
libs
The main project file is located in:
src.br.ufrpe.summarization.Summarizer
From this class you can run a typical Java application and you should be presented with a series of inputs on the console where you can pass different parameters to the summarizer.
You should provide the location of the dataset and the location where you want to save the summaries. The project alreagy bundles the dataset in a folder used by default. The dataset can be found in:
datasets.duc.data
There you will find two folders: duc-2001-stanford and duc-2002-stanford. Both folders have the complete respective duc dataset competition annotated with the CoreNLP framework, which is used by the summarization algorithm.
Once you run the application, the default folder location where the summaries are generated is located in
datasets.duc.system-summaries
Here, you have the same structue as before with two folders: duc-2001 and duc-2001. Containing the generated summaries for their respective datasets: duc-2001-stanford and duc-2002-stanford.
After generating the summaries you will use the ROUGE package to evaluate the summaries. You may want to follow the steps provided in the ROUGE github repository:
https://github.com/RxNLP/ROUGE-2.0#quick-start
- Carlos Rodrigo Garcia - CarlosRodrigo
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.