HashemAlsaket/SummaryInfoLoss

Can we quantify how much information is lost through summarization?

Jupyter Notebook

SummaryInfoLoss

How much information do we lose when we summarize? This project uses a deep learning classifier built with PyTorch to examine how podcast episode ranking is effected when ranked on the summary versus the transcript of the whole episode.

How to run:

subreddit_processing/scrape_subreddits.ipynb
subreddit_processing/get_news_from_urls.ipynb
scrape_summaries.ipynb
train_bow_classifier.ipynb
transcript_analysis.ipynb

Trained PyTorch DNN classifier:

checkpoint.pth.tar