reddit-summarizer

Rutgers CS 2019 spring NLP course project

Final Presentation: google sides link

Resources

first presentation: Can we summarize Reddit post?
Autotldr is a bot that uses SMMRY to automatically summarize long reddit submission.
text compactor tool
TL;DR The abstractive summarization challenge. Good dataset to use! An on-going challenge.
What is the state of text summarization research?.
Datasets for text document summarization?
A Quick Introduction to Text Summarization in Machine Learning. Described the types of techniques.
How to Clean Text for Machine Learning with Python .
Attention in Long Short-Term Memory Recurrent Neural Networks
A Brief Overview of Attention Mechanism. It has good equations.
Attention? Attention!. It has good equations, and introduces a family of attention mechanism.
DeepInf: Social Influence Prediction with Deep Learning, A very good paper to understand attension mechanism).
Graph Attension Networks
Keras Attention Mechanism
Neural Machine Translation by Jointly Learning to Align and Translate
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Youtube video: C5W3L08 Attention Model, short but very useful attention mechanism tutorial.
How to Develop an Encoder-Decoder Model with Attention for Sequence-to-Sequence Prediction in Keras
attention mechanism, a blog.
Another keras attention implementation with blog.
Attention Mechanisms in Recurrent Neural Networks (RNNs) - IGGG, a one-hour video.
What is a Transformer?
The Illustrated Transformer
Transformer — Attention is all you need
Paper-with-code: Attention Is All You Need
Codes for Transformer, BERT, etc.
Attention Is All You Need — Transformer
The Annotated Transformer

Tutorials & Tools

Encoder-Decoder Models for Text Summarization in Keras, code.
Text Summarization Using Keras Models
tensor2tensor
fairseq
A ten-minute introduction to sequence-to-sequence learning in Keras
Keras exmaple code: English to French
ml-notebooks
Keras BERT
keras-seq2seq-with-attention Note - Tensorflow 1.13 and greater versions currently have problems with the code.
Regarding hidden state(carry), cell state(memory): Hidden state is overall state of what we have seen so far. Cell state is selective memory of the past. The hidden state (h) carries the information about what an RNN cell has seen over the time and supply it to the present time such that a loss function is not just dependent upon the data it is seeing in this time instant, but also, data it has seen historically. link.
Understand the Difference Between Return Sequences and Return States for LSTMs in Keras
Without attention (in translation task):
- Words that only appear once or twice in the training data get mis-translated. (Not enough data)
- Words with locality difference between input and output sentences get mis-translated. E.g. in English a word appears at the start of the sentence while in Spanish it appears at the end.
- The dataset contains many sentences with different translations. These will always incur errors in our model.
- Attention is a concept which was designed to help fix this temporal limitation.
- Attention mechanism increases the computational burden of the model, but results in a more targeted and better-performing model.
- In addition, the attention model is also able to show how attention is paid to the input sequence when predicting the output sequence.

Metrics

ROUGE: TLDR challenge uses the F-1 scores accordingly for ROUGE-1, ROUGE-2 and ROUGE-LCS as quantitative evaluation.
Usually, a qualitative evaluation will be performed through crowdsourcing. Human annotators will rate each candidate summary according to five linguistic qualities as suggested by the DUC guidelines. - Re-evaluating Automatic Metrics for Image Captioning: This paper has good explanation for BLEU, METEOR, ROUGE, and CIDEr.

Papers

More thinking and future work notes

Attention mechanism helps.
Comparison between character-level model and word-level model.
Could we use hidden vector of the model to serve as embedding vector of the text, and further do other tasks like subreddit classification etc?
LSTM methods require lots of training data.
Should compare to a baseline model that is not so statistically intensive, like latent dirichlet allocation, as well as using a generic classification method like BERT that's not tuned to the particular text that you are working with.

hongluzhou/reddit-summarizer

reddit-summarizer

Final Presentation: google sides link

Resources

Tutorials & Tools

Metrics

Papers

More thinking and future work notes