/summarization

A stacked LSTM based Network for Text Summarization Using Keras

Primary LanguagePythonApache License 2.0Apache-2.0


Logo

Extractive Summarization Using Stacked RNN

Table of Contents

About The Project

The approaches to text summarization vary depending on the number of input documents (single or multiple), purpose (generic, domain specific, or query-based) and output (extractive or abstractive).

Extractive summarization means identifying important sections of the text and generating them verbatim producing a subset of the sentences from the original text; while abstractive summarization reproduces important material in a new way after interpretation and examination of the text using advanced natural language techniques to generate a new shorter text that conveys the most critical information from the original one.

Why we need this?

  • A summary is meant to inform your reader—who has not read the text or seen the presentation—of what the text is about. It describes its purpose or main idea, and summarizes the supporting arguments that develop that idea.

Built With

Getting Started

Below the the basic step to reproduce the code with few commands.

Language: Python 3.0+

  1. Clone the repository
git clone https://github.com/Shandilya21/extractive_summarization.git

Prerequisites

pip install -r requirement.txt

Before performing experiments, SET the config for the data path. In config.py Change the DATA_PATH to your data path location.

Dataset: Data can be downloaded from here (Raw Documents) and (Summary). Create and place the data inside the data/train folder. You may also create the test set to check the performance of the model from the split )(deifned in code).

Pretrained Weights: Download the pretrained weights from here (GloVe). save the file inside the data/embeddings/glove. if not feel free to save anywhere.

Before performing further steps, SET the config for the data path. In config.py Change the DATA_PATH to your data path location.

Usage

This repo are for 3 word window and 5 word window architecture text summarization (extractive methods), Also you can produce results for similar hparams. The default number of epochs = 5. Feel free to set the epochs from run.sh.

chmod +x run.sh
bash run.sh   

Results

Model Ratio Acc
1 0.31 71.24
3 0.37 77.88
5 0.41 80.08

Roadmap

See the open issues for a list of proposed features (and known issues). Also, if you have any issue, feel free to open a new issue.

Contributing

Contributions are what make the project such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git build -b build/newfeature)
  3. Commit your Changes (git commit -m 'Add some newfeature')
  4. Push to the Branch (git push origin build/newfeature)
  5. Open a Pull Request

Contact

Arunav Shandilya - arunavshandilya96@gmail.com