This repository contains a Text Summarization Tool implemented in Java, which was developed as a research and development (R&D) project for the Software Product Development (SPL 1) course.
Text summarization is the process of generating a concise and coherent summary of a longer text while preserving its key information. This tool provides three different extractive approaches to perform text summarization.
The primary objective of this research and development endeavor was to explore the field of text summarization, implement three different algorithms, and compare their performance.
The tool utilizes the following extractive approaches for text summarization:
-
Word Frequency Algorithm: This approach ranks sentences based on the frequency of important words in each sentence. The sentences with higher frequencies of essential words are considered more relevant and are included in the summary.
-
Text Rank Algorithm: Inspired by the PageRank algorithm used by Google, Text Rank treats each sentence as a node in a graph. The algorithm evaluates the importance of sentences by considering the relationships between them, and the most significant sentences are selected to form the summary.
-
Direct Method Algorithm: The Direct Method Algorithm uses a scoring function to rank sentences. The scores are calculated based on different features, such as sentence length, position in the text, and keyword frequency. The top-scoring sentences are chosen for the summary.
To assess the effectiveness of the three algorithms, performance evaluation is conducted using the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric. ROUGE is a widely used metric in natural language processing and summarization tasks, which measures the quality of a summary by comparing it to one or more reference summaries.
The tool generates summaries using each algorithm and compares them against reference summaries using ROUGE scores. This evaluation allows us to gain insights into the strengths and weaknesses of each algorithm and determine which one performs better for different types of texts and summarization requirements.
To use the Text Summarization Tool, you can follow these steps:
- Clone the repository to your local machine.
git clone https://github.com/jaf107/text-summarization-tool.git
-
Open the Java project in your preferred IDE.
-
Run the main program, providing the input text that you want to summarize.
-
The program will process the input text using the three extractive summarization algorithms.
-
Finally, the program will display the summaries generated by each algorithm along with their ROUGE scores, allowing you to compare their performance.
For a deeper understanding of text summarization and the algorithms used in this tool, you can refer to the following resources:
Feel free to contribute to the repository by adding new features, improving existing algorithms, or suggesting enhancements.
Happy text summarization!