A Frameword For Tracking Legislator's Policy Agendas

Mohammad H. Forouhesh

Metodata Inc ®

April 25, 2022

This repository contains the implementation for the following paper:

Tracking Legislators’ Expressed Policy Agendas in Real Time

TODO
A Brief Overview
Implementation details
Reproducing Results

1) Tracking Legislators’ Expressed Policy Agendas in Real Time

TO-DO:

A Brief Overview

Introduction:

This work aims to analyse political orientation of legislators on salient policy issues through their temporally granular tweets, using a word embedding for feature extraction, and a classifier to label all legislators’ past and current relevant tweets according to whether they express a particular issue position over time.
Main Problem:

Is it possible to accurately analyse the temporal evolution of political orientation on salient issues by applying natural language processing techniques on users tweets?

The issues of concern in this project are immigration, and climate change.
Illustrative Example:

Given a tweet about immigration policy, we first encode it using word2vec enhanced dictionary, then its exclusiveness or inclusiveness can be detected using a classifier. Furthermore these results can be disaggregated to see whether it was posted from a Republican or a Democrat.
I/O:
- Input: Tweets (textual modality)
- Output: Predicted stance on the salient political issue
Motivation:
1. Using tweets to track shifts in legislators’ rhetoric is highly scalable. It can be used on any topic of interest, by any political actor with a Twitter account, in any country around the world, from the past decade or into the future.
2. Twitter data has high temporal granularity.
Related (Previous) Works:

According to legislator’s different channels of communications, it is divided into 8 categories:
1. Stump speeches: Fenno 1978
2. Campaign mail: Golbeck, Grimes and Rogers 2010
3. Television advertising: Lau, Sigelman and Rovner 2007
4. Floor speeches: Martin and Vanberg 2008; Martin 2011; Quinn et al. 2010
5. Press releases: Grimmer 2010; Grimmer, Westwood and Messing 2014; Klüver and Sagarzazu 2016
6. Websites: Adler, Gent and Overmeyer 1998; Anstead and Chadwick 2008; Druckman, Kifer and Parkin 2009
7. RSS feeds: Cormack 2013
8. Social media posts: Gulati and Williams 2010; Barbera et al. 2018; Radford and Sinclair 2016; Shapiro et al. 2014; Lilleker and Koc-Michalska 2013
Contributions:
1. Simple, transparent, and interpretable approach to tweet classification can achieve satisfactory levels of accuracy across diverse issues.
2. Automate the process of updating and maintaining the model.
3. Develop a dynamical, real-time scalable method for tracking elected officials’ expressed policy positions through their tweets.
Proposed Method:
- Stage I: (Feature Extraction)
  
  They used Word2Vec enhanced dictionary to encode the texts. In particular, a set of stemmed seed words is identified as being relevant to the concept of interest. Then use word embeddings to identify other words that are semantically related to these seed words in the data.
- Stage II: Classification of political stance on salient issues.
  
  Choice of classifier: using five-fold cross validation and comparing precision, recall, accuracy, balanced accuracy, and F1 scores to choose the best performing classifier among XGBoost, Naive Bayes, Elastic Net, Lasso.

Experiments:

Datasets:

Their own making. Crawled all senators and the vast majority of members of the House tweets using twitter API from any period of interest up to 2020, excluding those who left office or were elected for the first time.
Results:

Trained word embeddings on the entire corpus of legislators’ tweets. The word2vec dictionaries are limited to the 100 most similar words to the seed words and overly general or irrelevant terms are omitted. The detailed results provided in the appendix is summarised in the below table:

Dataset	Issue	Classification Method	F1-score	Recall	Precision	Accuracy	Balanced Accuracy
Crawled Legislators' Tweets	Immigration (Exclusive or Not)	Naive Bayes	0.885	0.853	0.921	0.813	0.738
		XGBoost	0.871	0.909	0.836	0.795	0.668
		Elastic Net	0.881	0.967	0.809	0.801	0.615
		Lasso	0.871	0.962	0.797	0.784	0.586
	Immigration (Inclusive or Not)	Naive Bayes	0.892	0.865	0.920	0.830	0.781
		XGBoost	0.888	0.916	0.861	0.828	0.746
		Elastic Net	0.890	0.978	0.817	0.821	0.674
		Lasso	0.894	0.974	0.826	0.828	0.691
	Climent Change (No Action or Not)	Naive Bayes	0.889	0.874	0.904	0.827	0.742
		XGBoost	0.888	0.896	0.880	0.818	0.698
		Elastic Net	0.891	0.963	0.830	0.811	0.575
		Lasso	0.892	0.965	0.830	0.813	0.576
	Climent Change (Take Action or Not)	Naive Bayes	0.687	0.742	0.640	0.758	0.746
		XGBoost	0.678	0.694	0.662	0.736	0.729
		Elastic Net	0.706	0.764	0.655	0.745	0.748
		Lasso	0.700	0.764	0.646	0.738	0.742

Implementation details:

Reproducing Results for XGB

Dataset	Issue	Classification Method	F1-score	Recall	Precision	Accuracy	Balanced Accuracy
Crawled Persian Tweets	JCPOA (Relevant or Not)	Naive Bayes	0.845	0.901	0.792	0.843	0.839
		XGBoost	0.999	0.999	0.999	0.999	0.999
		Passive Aggressive	0.991	0.983	0.994	0.992	0.991
		Lasso	0.988	0.985	0.983	0.984	0.987
	Stock Market (Relevant or Not)	Naive Bayes	0.892	0.865	0.920	0.830	0.781
		XGBoost	0.999	0.999	1.000	0.999	0.999
		Elastic Net	0.890	0.978	0.817	0.821	0.674
		Lasso	0.894	0.974	0.826	0.828	0.691
	Vaccination (Relevant or Not)	Naive Bayes	0.870	0.92	0.82	0.855	0.883
		XGBoost	1.000	1.000	1.000	1.000	1.000
		Passive Aggressive	0.975	0.945	0.965	0.97	0.95
		Lasso	0.971	0.955	0.973	0.970	0.959
	Filtering (Relevant or Not)	Naive Bayes	0.687	0.742	0.640	0.758	0.746
		XGBoost	0.950	0.951	0.958	0.954	0.950
		Elastic Net	0.706	0.764	0.655	0.745	0.748
		Lasso	0.700	0.764	0.646	0.738	0.742

MohammadForouhesh/tracking-policy-agendas

A Frameword For Tracking Legislator's Policy Agendas

Table of Contents

1) Tracking Legislators’ Expressed Policy Agendas in Real Time

TO-DO:

A Brief Overview

Introduction:

Main Problem:

Illustrative Example:

I/O:

Motivation:

Related (Previous) Works:

Contributions:

Proposed Method:

Stage I: (Feature Extraction)

Stage II: Classification of political stance on salient issues.

Experiments:

Datasets:

Results:

Implementation details:

Reproducing Results for XGB