unitedstates/congress

BILLSTATUS fails to download

GoldenJoe opened this issue · 1 comments

docker run -t --rm -v /Users/GoldenJoe/Desktop/Congress2020:/congress unitedstates/congress govinfo --collections=BILLSTATUS --congress=116

Will produce failed downloads:

Downloading: https://www.govinfo.gov/sitemap/BILLSTATUS_sitemap_index.xml
Downloading: https://www.govinfo.gov/sitemap/BILLSTATUS_2019_sitemap.xml
Downloading: data/govinfo/BILLSTATUS/116s394/package.zip
Error downloading https://www.govinfo.gov/content/pkg/BILLSTATUS-116s394.zip:

Traceback (most recent call last):

  File "/opt/theunitedstates.io/congress/tasks/utils.py", line 313, in download
    scraper.urlretrieve(url, cache_path, **urlopen_kwargs)

  File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 346, in urlretrieve
    result = self.request(method, url, data=body, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 296, in request
    raise HTTPError(resp)

HTTPError: 404 while retrieving https://www.govinfo.gov/wssearch/content/pkg/content/

Downloading: data/govinfo/BILLSTATUS/116s852/package.zip
Error downloading https://www.govinfo.gov/content/pkg/BILLSTATUS-116s852.zip:

Traceback (most recent call last):

  File "/opt/theunitedstates.io/congress/tasks/utils.py", line 313, in download
    scraper.urlretrieve(url, cache_path, **urlopen_kwargs)

  File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 346, in urlretrieve
    result = self.request(method, url, data=body, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 296, in request
    raise HTTPError(resp)

HTTPError: 404 while retrieving https://www.govinfo.gov/wssearch/content/pkg/content/

And so on. Looks like bill status is no longer stored this way.

govinfo.py Line#336 is where the URL is generated. It will probably need to be updated, but I'm not sure as to what.

Poking around, I found this: https://www.govinfo.gov/content/pkg/HOB-2019.zip. Is this the correct information?

Had a typo. The command should be:

docker run -t --rm -v /Users/GoldenJoe/Desktop/Congress2020:/congress unitedstates/congress govinfo --**bulkdata**=BILLSTATUS --congress=116

Maybe it would be helpful to add a warning if an unexpected option is specified?