/hitchhiker_bills

This repository contains the replication material of the article "More Effective Than We Thought: Accounting for Legislative Hitchhikers Reveals a More Inclusive and Productive Lawmaking Process", to be published in the American Journal of Political Science, by Andreu Casas, Matthew Denny, and John Wilkerson.

Primary LanguageR

More Effective Than We Thought

This repository contains the replication material of the article "More Effective Than We Thought: Accounting for Legislative Hitchhikers Reveals a More Inclusive and Productive Lawmaking Process", to be published in the American Journal of Political Science, by Andreu Casas, Matthew Denny, and John Wilkerson.

Data

The ./data/ directory contains the necessary data to replicate the analytical figures and tables of the paper. Below, we describe each of the datasets in this directory:

  • main_db.csv: This is the main bill-level dataset. A detailed description of each variable can be found in codebook.pdf.
  • house_assignments_103-115-3.xls: House committee assignments dataset (103 through 115th Congress), from Charles Stewart III's website.
  • senators_103-115-2.xls: Senate committee assignments dataset (103rd through 115th Congress), from Charles Stewart III's website.
  • LEPData93to110Congresses.xlsx: Legislative Effectiveness Scores (93rd through 110th Congress), from the Center for Effective Lawmaking (Volden and Weiseman).
  • LEPData111to113Congresses.xlsx: Legislative Effectiveness Scores (111th through 113th Congress), from the Center for Effective Lawmaking (Volden and Weiseman).
  • hr146_bi80_uni90_labeled.csv: A dataset with information about manually labeled potential hitchhikers inserted into 111-HR-146. The true_match_num2 variable indicates whether the bill-law pair has been labeled as hitchhiker, version_a is the BillID of the potential hitchhiker, version_b is the target law 111-HR-146, and the rest of the columns are the pairwise features described in Supporting Information B of the paper.
  • members_comm_assign_w_MemberID: Indicates member- and congress- level committee assignments. The member IDs can be matched to further member-level information through the information in main_db.csv. For a description of the committee topics (variable cMem), see Table 3 in codebook.pdf.
  • in the ./data/predictions/ subdirectory there are files related to the hitchhiker discovery process: the hitchhikers predicted at each stage of the process by the best and highest performing models. These files are used in 09-supporting-info-C-summary-of-hitchhiker-discovering-process.R to replicate Table 3 in Supporting Information C, where we summarize the process and report the ensemble precision and recall at each stage. The files starting with the substring labs_db contain the data used for training all models in each iteration. The ones that start with best_models contain information about the first group of best models in each iteration (after applying precision-recall filter); and the ones that start with best_best_models have information about the best models in each iteration after applying extra filter (exclude models that predict more than 10 hitchhikers to be inserted into more than 1 law). The files that begin with ensemble_preds contain information about the hitchhikers predicted at each stage by the ensemble of high performing models; and finally, the ones that start with crossval_res report crossvalidated accuracy for the ensemble model at each iteration.

Code

The ./code/ directory contains separate scripts to replicate each analytical figure in the article. The ./figures/ directory contains a copy of each of the figures generated by these scripts.

  • 03-models.R: Code to replicate Figure 4, and the model coefficients in Table 4 (in Supporting Information D), showing the relationship between a set of covariates and the probability of a bill being enacted as stand-alone law, or as a hitchhiker bill.

  • 04-figure5-general-effects-on-effectiveness.R: Code to replicate figure 5 of the paper, showing how counting hitchhikers as enacted legislation increases the proportion of different types of members that get at least 1 bill enacted in any given Congress.

  • 05-figure6-LES-v-our-measure-of-effectiveness.R: Code to replicate Figure 6 of the paper, comparing our measure of effectivenes (legislation enacted as proportion of legislation introduced) v. Legislative Effectiveness Scores, of Volden and Weiseman. This script generates 2 figures: figure6a-LES-vs-OUR-indiv-diff.png and figure6b-LES-vs-OUR-indiv-diff-FULL-DIST.png. For the article we manually placed the second figure into the upper right corner of the first. The first one is a truncated distribution whereas the second one illustrates the full distribution we aim to capture in this section of the article.

  • 07-supporting-info-A-preprocessing.R: Code to replicate the text pre-processing procedure described in the Supporting Information A of the paper. We remove all the procedural text and sections that should not be taken into consideration when comparing the substantive content of bills, as well as meaningless words such as stop words and other frequent tokens (e.g. section, act, secretary, etc.). For simplicity, in this script we show how to pre-process two example bills (103-HR-1-IH and 103-HR-2-RH). The same process can then be applied to pre-process all the bill versions collected for the study. The rest of the text files for each bill version can be easily downloaded from congress.gov. The two example raw files are located in the ./data/bills/raw/ directory, and the pre-processed versions are located in ./data/bills/clean/.

  • Supporting Information B: This is a link to the document_similarities() function of the SpeedReader package, written by one of the authors of the article (Matthew Denny), and that we use the perform the pairwise comparisons of bills, and to extract the features described in the Supporting Information B section of the article.

  • 08-supporting-info-C-stage01-predicted-hitchhikers.R: Code to replicate Figure 8 of the paper, where we show the distribution of the number of models (out of 99 high performing models) that predicted the same hitchhiker in the first stage of the hitchhiker discovering process.