Authors :

Deepthi Girish Singapura
Varsha Chandan Bellara
Srushti Singatagere Basavaraj
Shiva Shankar Bidadi Nanjundaswamy

Project Name

Title : Extractive Text Summarization with Lexical Chain, Modified Text Rank, Text Rank and TConspectus for news articles.

Data Set:

Raw Data Sets to be Downloaded from : http://mlg.ucd.ie/datasets/bbc.html

PreProcessing

Article Tokenization
Case folding of token
Stop word removal
Lemmatization, Remove non alpha-numeric characters

Usage

LexRank: python3 main.py
PageRank: python3 main.py
Lexical Chain: python3 main.py
Hybrid: python3 summarizer.py
Sumy: python3 extractSummary.py
Evaluate Summaries: python3 documentComparator.py

Algorithm Evaluation:

Generated summaries based on compression ratio
Expressed documents as vectors
Compared all the Algo-generated-summaries using cosine values

Accuracy:

10%

LexRank : 52.57 %
Pagerank : 51.18 %
Lexical Chain: 74.36 %
Hybrid Algorithm : 61.45 %

20%

LexRank : 63.12 %
Pagerank : 61.91 %
Lexical Chain: 79.95 %
Hybrid Algorithm : 70.86 %

30%

LexRank : 72.31 %
Pagerank : 71.11 %
Lexical Chain: 84.21 %
Hybrid Algorithm : 77.91 %

40%

LexRank : 78.59 %
Pagerank : 78.13 %
Lexical Chain: 86.64 %
Hybrid Algorithm : 83.87 %