Given word vectors trained by word2vec
(Mikolov et al. 2013) or fastText
(Bojanowski et al. 2016), this program projects the vectors of U.S. senators onto a "conservative" to "liberal" axis. The scalar components of such projections may be interpreted as a valid metric of political ideology.
Learn more about this project at here. See this iPython Notebook for complete experiment results.
Plotting the vector projected ideology against DW-NOMINATE, an ideology metric widely used in political science, reveals a strong correlation:
Training Corpus | Avg. Pearson’s r | Avg. Spearman’s 𝝆 |
---|---|---|
NYT 1981 - 2016 | 0.7559 | 0.7602 |
Wash Post 1977 - 2007 | 0.7902 | 0.8003 |
WSJ 1997 - 2017 | 0.7205 | 0.7184 |
In addition to members of Congress, you can also project vectors of public policies. These results are quite amusing but still highly experimental. Again, for a detailed account, please refer to here.
Gensim (optional, only needed for the experimental feature of projecting public policies.)
The DW-NOMINATE ideology data is available at voteview.com. Some example data is already included in this repo.
I apologize that I have tested the code only with Python 3.6
There are two methods for loading vectors into PoliVec Projector:
First Method: Use the Word2VecProjector
class to read vector files generated by word2vec. Call evaluate_ideology_projection()
to evaluate a single congressional session of ideology data. Call multiyear_evaluation()
and pass an iterator, e.g. multiyear_evaluation(cgrs_sess=range(97,115))
, to evaluate multiple years of data. The iPython notebook includes several examples that will help you get started.
Second Method: (seemingly more complicated, but more efficient for comparing multiple years of data) Provide a plain text list of words you want to query, along with the axes onto which you want to project. The axes follow the order of: [positive x axis, negative x axis, positive y axis, negative y axis]
. An example queries.txt looks like this:
conservative
liberal
good
bad
johnson
nixon
carter
reagan
etc.
If you are interested in members of Congress, the gen_queries.py
script in this repo can take care of this step for you. The name lists of the 95th - 114th Senate (1977 - 2017) are also already included in this repo at queries/
Then, run gen_vectors.sh
, which takes multiple lists of queries and feed them to fastText's print-word-vectors
function. Be sure to revise the directories specified in the shell script so that it loads your own pre-trained fastText models. The vectors of members of the 95th to 114th Senate are also already included in this repo at queried_vectors/
Lastly, create a FastTextProjector
object to load the queried vectors, then call evaluate_ideology_projection()
or multiyear_evaluation()
.
(In principal, you can use PoliVec Projector with any word embedding models, so long as you make a subclass tweak the file IO methods to load your vectors properly.)
MIT