This repository is a reproduction package for Resonant Journalism in the Russo-Ukrainian War: A Topic Modeling Approach to Key-Point Detection, the thesis for my BSc in Economics (Universidade de BrasĂlia - Brazil)
This study sheds light on the potential of unstructured data for the detection of major happenings in global events. We detect key points of the Russo-Ukrainian War using topic modeling on a newly curated, large-scale dataset of news stories and investigate whether the differences in topic distributions can highlight unique trendsetting potentials in reporting across major news outlets.
All.parquet is a dataframe featuring 61.165 news stories from 11 different international news sources regarding Russia and Ukraine, from july 2021 to december 2022. The columns included are: Date, URL, Title, Text. The same dataset can be found in the processed data directory All_n10.parquet, but with the measures of Novelty, Resonance and Transience added (time scale = 10), as well as the Topic detected using LDA.
- ABC, AP, CBS, CNN, DailyMail, Express, Fox, Guardian, Mirror, NY Times, Reuters
- (Barron et al. 2018) Implementation of latent Dirichlet allocation (LDA)
- KLD-Based measures of Novelty, Resonance and Transience