/legislation-project

Exploratory project working with U.S. Code and Congress API.

Primary LanguageJupyter Notebook

U.S. Legislation

Exploratory project using data from:

See also: congressapi

Interactive results will be hosted at us-legislation-data.appspot.com

Related Work

Loading data into MongoDB

Use congressdb.convert2mongod to import to bulk Congress data into a MongoDB database. This vastly simplifies the process of generating training data sets later.

Build language model

Generate dataset

Use congressdb to build a dataset of the bills introduced by the House in the 114th congress:

python -m congressdb.build --src=/data/congress --output=house-introduced-114 \
                             --type=hr --version=is --congress=114

This will create a directory house-introduced-114 with training, validation, and test data splits and a vocab file. See congressdb/build.py for more info.

Train model

A known set of good hyperparameter are in hparams.yaml.

python -m lm.train --data_path=house-introduced-114 --model_dir=/tmp/house-model --hparams=hparams.yaml

This will start training a model using the dataset we just generated. Snapshots of the model will go to /tmp/house-model. To visualize/monitor the training process, start TensorBoard pointed at the model directory.

tensorboard --logdir=/tmp/house-model

Sample some generated text

To create a sample from the language model, using the latest snapshot:

python -m lm.generate --model_dir=/tmp/house-model --data_path=house-introduced-114 \
                          --hyperparams=hparams.yaml --max_length=1000 --temp=1.1 \
                          --output=sample.txt

Omitting the --output flag will print to stdout.

Evaluate on test set

python -m lm.evaluate --model_dir=/tmp/house-model --data_path=house-introduced-114 \
                      --hparams=hparams.yaml

Using the hyperparameters in hparams, training for 274K iterations (~9 epochs), we end up with a test set perplexity of 13.1.

And creates clauses of legislation that look like this:

(2) Preservation of actions.-- The guidelines submitted to determine all right of any action shall be resolved in the federal register on the final patent repayment plan, a imputed known as a national examiner, a media order, contact authority, and information it determines that a portion of the exemption from tax is allocated. Such center shall not receive such transfers for the total amount of payment of funds with respect to work eligibility under the alternative limit by reason of section 408b.

Compared to a real snippet hr1347/text-versions/ih/document.txt

(2) Preservation of records.--The State shall ensure that the records of the independent redistricting commission are retained in the appropriate State archive in such manner as may be necessary to enable the State to respond to any civil action brought with respect to Congressional redistricting in the State.

CAVEAT: Spacing around punctuation symbols fixed manually. Capitalization stripped from original model and thus re-introduced above.

Real snippet found by searching text for "Preservation of".

Fun Fact: The phrase "Preservation of actions" does not show up in any bill introduced by the 114th House of Representatives.

Serving the Model

You'll need to have TensorFlow Serving installed. See https://tensorflow.github.io/serving/

First, export the model:

python -m lm.export --model_dir=/tmp/house-model --data_path=house-introduced-114 \
                    --hparams=hparams.yaml --export_dir=/tmp/serve/house-model --version=1

Build the default server,

bazel build //tensorflow_serving/model_servers:tensorflow_model_server

and bring it up pointing to our export directory:

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
	--port=9000 \
	--model_base_path=/tmp/serve/house-model \
	--model_name=lm

Querying the Served Model

With a commandline client...

python -m serving.lm_client --server=127.0.0.1:9000 --num_tests=1 --token=5

But also, there's an interactive webpage in the works for presenting the models.

cd site
./start_server.sh

And then in a browser go to localhost:5000. You should see a poor man's bar chart. Honestly, it's awful right now. Just followed the intro D3 tutorial. But just you wait.

Deploying the Web App

It should all be contained in the app/ directory. pip install all requirements.txt in a virtual env

virtualenv --python=/usr/local/lib/python2.7.13/bin/python env 
source env/bin/activate
pip install -t lib -r requirements.txt

The -t lib is important!

NOTE: Because I was using Ubuntu 14.04, I followed instructions here: http://mbless.de/blog/2016/01/09/upgrade-to-python-2711-on-ubuntu-1404-lts.html to upgrade to python 2.7.13 (when using virtualenv, so that requests library works properly).

To use congressapi, you'll have to add an api_keys.py file with PROPUBLICA_CONGRESS_API_KEY constant.