unitedstates/congress

Error in parsing sponsor & byRequest

aih opened this issue · 4 comments

aih commented

In running the 'bills' scraper, I get an error on some bills from the 110th Congress which blocks further processing, due to missing sponsor in these lines:

'by_request': bill_dict['sponsors']['item'][0]['byRequestType'] is not None,

 'by_request': bill_dict['sponsors']['item'][0]['byRequestType']     is not None,
 'sponsor': bill_info.sponsor_for(bill_dict['sponsors']['item'][0]),

I am temporarily bypassing this in my local code by defining by_request and sponsor earlier in the code like this:

if bill_dict.get('sponsors') and bill_dict['sponsors'].get('item') and len(bill_dict['sponsors']['item']) > 0:
        sponsor = bill_info.sponsor_for(bill_dict['sponsors']['item'][0])
    else:
        sponsor = None
    
    byRequestTypeExists = sponsor and bill_dict['sponsors']['item'][0].get('requestType')

Is this a new issue? Am I doing something wrong in how I'm running the scraper?

The 110th Congress is new data that you might be the first to try to run the scraper on, so it might have some upstream issues. Can you paste a bill number so I can try it?

aih commented

Unfortunately, I didn't log the info of which bills it failed on. But... I could make a PR if you think the solution above is reasonable.

I think there might be something else going on. It's probably a "reserved" bill number without a sponsor, and for those we might be better to just skip them entirely. This happened around the same time on some current bills. I might have fixed it in 28593b1.

aih commented

Probably so... I pulled your changes from 28593b1 after I had put in my changes above, so I wouldn't have seen whether the new commit fixes the issue. In any case, I'll close this issue, and will re-open if it comes back again.