/scrapers-us-municipal

Scrapers for US municipal governments.

Primary LanguagePython

municipal-scrapers

Build Status

DataMade's source for municipal scrapers.

To find out more about the ins-and-outs of these scrapers, as well as how to create your own, head on over to docs.opencivicdata.org's scraping page.

Issues?

Issues with the data coming from these scrapers should be filed at the OCD Data issue tracker

All Open Civic Data issues can be browsed and filed at the Open Civic Data JIRA instance.

Development

Requires python3, postgresql

Initialization

Assuming that you want to have your database be called opencivicdata on your local machine

pip install -r requirements.txt
createdb opencivicdata
export DATABASE_URL=postgresql:///opencivicdata
pupa dbinit us

Initializing the database may take some time - but if it above goes as expected, then you should see something like this:

Operations to perform:
  Apply all migrations: contenttypes, core, legislative, pupa
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying core.0001_initial... OK
  Applying legislative.0001_initial... OK
  Applying legislative.0002_more_extras... OK
  Applying legislative.0003_time_changes... OK
  Applying pupa.0001_initial... OK
  Applying pupa.0002_auto_20150906_1458... OK
  Applying pupa.0003_auto_20151118_0408... OK
  Applying pupa.0004_identifier... OK
  Applying pupa.0005_auto_20170522_1935... OK
  Applying pupa.0006_identifier_jurisdiction... OK
193484 divisions found in the CSV, and 0 already in the DB

Finally, initialize your new scraper (if you so desire):

pupa init YOUR_CITY_SCRAPER

Troubleshooting

Your database expects the postgis extension. Do you have this? If not, running pupa dbinit us may throw an error:

django.db.utils.ProgrammingError: permission denied to create extension "postgis"
HINT:  Must be superuser to create this extension.

Create this extension:

psql -d opencivicdata
CREATE EXTENSION postgis

At times, the release of ocd-django on PyPI differs from that of Github. This may cause problems if you need to create and run migrations. Specifically, you might encounter an ImproperlyConfigured error that instructs you to do the following:

You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.

Fix the problem by running:

export DJANGO_SETTINGS_MODULE=pupa.settings

Then, you should be able to successfully run:

django-admin makemigrations
django-admin migrate

Testing

Before submitting a PR, please run pupa update YOUR_CITY_SCRAPER

export DATABASE_URL=postgresql:///opencivicdata
pupa update YOUR_CITY_SCRAPER

Making changes to this fork

We want changes in this repo to all be rebased off opencivicdata/scrapers-us-municipal.

To achieve this, first pull from origin of the fork.

git pull origin master

Then, make your changes and commit them locally. Next, rebase your changes onto upstream master.

git pull upstream master
git push origin master

Handling unresolved bills

For LA Metro, we have alerting set up to notify us if pupa is not able to resolve a board report associated with an agenda item. This is an important diagnostic tool to help us track down why some board reports are not getting scraped.

Some board reports are not getting scraped because they have not been made public and will not be made public. When LA Metro tells us that they will not make a board report public, we should "ignore" that alert in Sentry. Also add a comment that you were directed by LA Metro that this board report will not be made public.

We take this approach instead of editing the scraper to ignore certain bills because it's possible that LA Metro may decide to change their mind about what to make public sometime in the future.