Analysis published at Roll Call.
This analysis used 124,435 tweets from 68 Senate candidates who consistently tweeted in the lead-up to the 2018 midterms. The tweets, some of which date back to Feb. 14, 2009, were downloaded using “rtweet,” a software package developed by Michael W. Kearney from the University of Missouri.
The tweets are either from the candidates themselves or their official campaign accounts. The “rtweet” package relies on the Twitter API, meaning we’ve collected no more than 3,200 tweets per account as limited by Twitter. More prolific candidates accordingly have samples that cover narrower and more recent time spans. We ran this dataset of tweets through a 10,222-word database developed by researchers at the Computational Story Lab at the University of Vermont that is designed to assign an average sentiment to inputs. Based on this dictionary, each tweet was scored on an average sentiment scale and compared to the original results.
To validate these sentiment scores, we used a machine learning model trained on 50,000 IMDB reviews to evaluate the sentiment of each tweet and classify each as either “positive” or “negative.” We then calculated the percentage of positive tweets for each candidate.