/text-summarization-extractive

Summarize text documents based on extraction.

Primary LanguagePython

text-summarization-extractive

Create summaries of text documents based on unsupervised extraction of most relevant sentences.

Implemented Methods Of Extraction

  • extractive_simple.py
    Get most relevant sentences with tf-idf metrics
  • extractive_text_rank.py
    TextRank implementation (based on word vectors)
  • extractive_lex_rank.py
    LexRank implementation (cosine similarity on tf-idf metrics)

Data

The directories in /data:

  • source_texts:
    Excerpts of wikipedia biographies falling in 3 broad topics:
    • Tudor dynasty (marked with "a")
    • Midcentury Architects / Designer (marked with "b")
    • Stars of the silent movie area (marked with "c")
  • target_texts:
    Very short texts based on source texts whith varying similarity, marked accordingly to the source texts. Also, one text about a movie star not included in source texts and one text about "Charlie Brown" without any topic affiliation (marked with "d").

This data is corresponding to: https://github.com/zushicat/text-topics

Download pre-trained word vectors glove.6B.zip, unzip file and place glove.6B.100d.txt in /data directory:
https://nlp.stanford.edu/projects/glove/

Further Reading

General

Packages

Extractive