BILLSTATUS fails to download
GoldenJoe opened this issue · 1 comments
docker run -t --rm -v /Users/GoldenJoe/Desktop/Congress2020:/congress unitedstates/congress govinfo --collections=BILLSTATUS --congress=116
Will produce failed downloads:
Downloading: https://www.govinfo.gov/sitemap/BILLSTATUS_sitemap_index.xml
Downloading: https://www.govinfo.gov/sitemap/BILLSTATUS_2019_sitemap.xml
Downloading: data/govinfo/BILLSTATUS/116s394/package.zip
Error downloading https://www.govinfo.gov/content/pkg/BILLSTATUS-116s394.zip:
Traceback (most recent call last):
File "/opt/theunitedstates.io/congress/tasks/utils.py", line 313, in download
scraper.urlretrieve(url, cache_path, **urlopen_kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 346, in urlretrieve
result = self.request(method, url, data=body, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 296, in request
raise HTTPError(resp)
HTTPError: 404 while retrieving https://www.govinfo.gov/wssearch/content/pkg/content/
Downloading: data/govinfo/BILLSTATUS/116s852/package.zip
Error downloading https://www.govinfo.gov/content/pkg/BILLSTATUS-116s852.zip:
Traceback (most recent call last):
File "/opt/theunitedstates.io/congress/tasks/utils.py", line 313, in download
scraper.urlretrieve(url, cache_path, **urlopen_kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 346, in urlretrieve
result = self.request(method, url, data=body, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapelib/__init__.py", line 296, in request
raise HTTPError(resp)
HTTPError: 404 while retrieving https://www.govinfo.gov/wssearch/content/pkg/content/
And so on. Looks like bill status is no longer stored this way.
govinfo.py Line#336 is where the URL is generated. It will probably need to be updated, but I'm not sure as to what.
Poking around, I found this: https://www.govinfo.gov/content/pkg/HOB-2019.zip. Is this the correct information?
Had a typo. The command should be:
docker run -t --rm -v /Users/GoldenJoe/Desktop/Congress2020:/congress unitedstates/congress govinfo --**bulkdata**=BILLSTATUS --congress=116
Maybe it would be helpful to add a warning if an unexpected option is specified?