/SummaryInfoLoss

Can we quantify how much information is lost through summarization?

Primary LanguageJupyter Notebook

SummaryInfoLoss

How much information do we lose when we summarize? This project uses a deep learning classifier built with PyTorch to examine how podcast episode ranking is effected when ranked on the summary versus the transcript of the whole episode.

How to run:

  1. subreddit_processing/scrape_subreddits.ipynb
  2. subreddit_processing/get_news_from_urls.ipynb
  3. scrape_summaries.ipynb
  4. train_bow_classifier.ipynb
  5. transcript_analysis.ipynb

Trained PyTorch DNN classifier:

checkpoint.pth.tar