/Topic-Modeling-Debates

Topic Modeling Analysis on the Democratic Debates & Data Cleanup on Scraped data

Primary LanguageJupyter Notebook

Democratic Primaries 2020 Analysis

Applying Topic Modeling to Analyze the 2020 Democratic Debates

Topic Modeling is a type of dimensionality reduction that helps to reveal latent topics in large texts. In this analysis, I use a flavor of Topic Modeling, Nonnegative Matrix Factorization (NMF) to see what each of the 2020 Democratic presidential candidates focused on during the debates.

To learn more about Topic Modeling, see my short Two-Pager on a very brief mathematical explanation of Topic-Modeling Methods, including SVD, NMF, and LDA. https://github.com/branden-ciranni/papers/blob/main/Mathematics_of_Topic_Modeling.pdf

Two notebooks are present,

  • Scraping Debate Transcripts.ipynb: Data Scraping, Cleaning, and Transformation
  • eda_topic_modeling_where_candidates_focus.ipynb: EDA, Text Preprocessing, Topic Modeling

Take a look at this dataset on Kaggle: https://www.kaggle.com/brandenciranni/democratic-debate-transcripts-2020