The following steps in Figure 1.0 demonstrate the proposed methodology to achieve the desired results.
Figure 1.0 - The workflow of the proposed methodology.The input data used for this project was sourced from various wiki platforms.
Libraries/Dependencies used: Following are the libraries used in the code.
An automatic summarization module named sumy was used as a benchmark to analyze the results of the two algorithms.
The TextRank algorithm’s summarized output is small and precise, showing the most important sentences first according to the rank generated by the similarity matrix. The drawbacks of it are unordered sentences which reduce the meaning of the document. Also, it doesn’t take into account the use of proper pronouns in the summary. It takes the original sentences from the document without considering the meaningful ordering the summary should be in.
The Named Entity Recognition algorithm’s summarized output provides expected results as the name suggests. It does the intended job, i.e. to recognize the named entities (proper nouns - names of people) and replace the pronouns in the article with the named entity - proper noun/names. Although this weighs over the actual purpose of meaningful summaries and is unable to retain the grammatical correctness of the original article.