/Text-summarization-

Text Summarization

Primary LanguageJupyter Notebook

Text-summarization

image

Introduction

Text summarization can be defined as the process of extracting important sentences a large document and combining them into a short version of the document. Text summarization can be split into two types of summaries: extractive and abstractive :

  • Abstractive => which ai system tries to write the summary of a given text just like a human would do " first read the input text and try to write a summary".
  • Extractive => where we try to learn the system that learns about important sentence in the input text and just extract them "Try to achieve a model that make a text summarization as human ".

image

Summarizing a text can be both time-efficient and cost-efficient.

  • Time-efficient => means the human reader can spend less time reading a document by reading a summarized version
  • Cost-efficient => means summarizing can be used for compressing the amount of textual data being transferred from a device to another device.

Main Goal

The goal of classification is to classify the dataset before summarizing and test the summarized text on the trained models to see the accuracy increase or dicrease the same aspect for clustering, vlustering before and after and see the number of clusters

Main topics covered in methodology

  • Dataset

    • Integration
      • BBC news (business,entertainment,politics,sport,tech)
      • Covid 19 research papers
    • Visualization
    • Pre-processing
      • Lemmatization lower
      • remove stop words
      • remove unwanted chars (/n/n)
      • remove punctuation, HTML and URL
  • Modeling

    • LSA
    • T5
    • PEGASUS
    • Clustring
    • Classification (Linear SVM, MultiNomialNB, Logistic Regression)
  • Evaluation

Conclusion

  • Proposed a summarization system that use different techniques to summarize news + medical articles.
  • Used the summarized articles to do a classification job with different classification algorithms and compared its performance against the original articles.
  • The results shows high performance from the summarized articles with little difference against the original one, which proves that we can rely on them as a source of information.

contact

imagehttps://www.linkedin.com/in/%D9%90%D9%90alaa-elkhashap/