This Python script provides a simple text summarization function using Spacy, a popular natural language processing library, and word frequency analysis. The summarizer generates a summary of a given text by extracting the most relevant sentences.
-
Dependencies
- Ensure you have the following dependencies installed:
- Spacy (
pip install spacy
) - SpaCy's English language model (
python -m spacy download en_core_web_md
)
- Spacy (
- Ensure you have the following dependencies installed:
-
Function
- Import the necessary modules:
import spacy from spacy.lang.en.stop_words import STOP_WORDS from string import punctuation from heapq import nlargest
- Import the necessary modules:
-
Summarization Function
- Define the
summarizer
function:def summarizer(text): # Code for summarization goes here # ... return summary, len(text.split(' ')), len(summary.split(' '))
- Define the
-
Input
- Call the
summarizer
function, passing the text you want to summarize as an argument:text_to_summarize = "Your input text goes here." summary, original_word_count, summary_word_count = summarizer(text_to_summarize)
- Call the
-
Output
summary
: The summarized textoriginal_word_count
: The word count of the original textsummary_word_count
: The word count of the summary
-
Preprocessing
- Load the Spacy language model and tokenize the input text into words.
- Remove stop words and punctuation from the tokens.
- Calculate word frequencies.
-
Scoring Sentences
- Score each sentence based on the sum of the normalized frequencies of its constituent words.
-
Summary Generation
- Select the top 30% of sentences based on their scores to form the summary.
-
Return
- Return the summary and word counts for evaluation.
- This code utilizes the Spacy library and the English language model provided by SpaCy. For more information, visit Spacy.
This code is provided under the MIT License.
Feel free to modify and use this code for your projects!