Topic Modeling is a type of dimensionality reduction that helps to reveal latent topics in large texts. In this analysis, I use a flavor of Topic Modeling, Nonnegative Matrix Factorization (NMF) to see what each of the 2020 Democratic presidential candidates focused on during the debates.
To learn more about Topic Modeling, see my short Two-Pager on a very brief mathematical explanation of Topic-Modeling Methods, including SVD, NMF, and LDA. https://github.com/branden-ciranni/papers/blob/main/Mathematics_of_Topic_Modeling.pdf
Two notebooks are present,
Scraping Debate Transcripts.ipynb
: Data Scraping, Cleaning, and Transformationeda_topic_modeling_where_candidates_focus.ipynb
: EDA, Text Preprocessing, Topic Modeling
Take a look at this dataset on Kaggle: https://www.kaggle.com/brandenciranni/democratic-debate-transcripts-2020