/reddit-summarizer

2019 spring NLP course project

Primary LanguagePython

reddit-summarizer

Rutgers CS 2019 spring NLP course project

Final Presentation: google sides link

Resources

Tutorials & Tools

Metrics

  • ROUGE: TLDR challenge uses the F-1 scores accordingly for ROUGE-1, ROUGE-2 and ROUGE-LCS as quantitative evaluation.
  • Usually, a qualitative evaluation will be performed through crowdsourcing. Human annotators will rate each candidate summary according to five linguistic qualities as suggested by the DUC guidelines. - Re-evaluating Automatic Metrics for Image Captioning: This paper has good explanation for BLEU, METEOR, ROUGE, and CIDEr.

Papers

More thinking and future work notes

  • Attention mechanism helps.
  • Comparison between character-level model and word-level model.
  • Could we use hidden vector of the model to serve as embedding vector of the text, and further do other tasks like subreddit classification etc?
  • LSTM methods require lots of training data.
  • Should compare to a baseline model that is not so statistically intensive, like latent dirichlet allocation, as well as using a generic classification method like BERT that's not tuned to the particular text that you are working with.