unitedstates/congress

Vote format has changed for House 2020?

demongolem opened this issue · 6 comments

Here is one that is not Python 3 :)

I am running over the code in vote.py and I see that the regex on vote id is failing. And that is because instead of the 4 parts that were expected I am seeing some vote string have 5th parts. The 4th part was the year, but in this string the 5th part is now the year and the 4th part is something which I have not discovered yet. Let me give you an example string. Perhaps the format has changed and newer vote ids need separate processing.

h102-116.5.2020

For regex I have something like this is split_vote_id which is actually in utils.py. Maybe I am missing the end $ in mine, but anyhow an additional number group representing the 5 above needs to be added.

    return re.match("^(h|s)(\d+)\-(\d+)\.(\d+)\.(\d\d\d\d|[0-9A-Z])", vote_id).groups()
    #return re.match("^(h|s)(\d+)-(\d+).(\d\d\d\d|[0-9A-Z])$", vote_id).groups()

I run these scripts every few hours every day to pull in new data and haven't been having a problem.

What command line are you using? Where is this vote id coming from?

I use ./run votes. A typical value for vote_id at the above line which I commented out is

h102-116.5.2020h102-116.5.2020h102-116.5.2020h102-116.5.2020h102-116.5.2020

except they are unique ids concatenated together (not the same vote id over and over again) which need to be split (I don't have the output in front of me right now)

At https://github.com/unitedstates/congress/wiki/votes I see the vote id looks like

"vote_id": "h202-113.2013"

"h202-113.2013" is what the vote IDs should look like. I'm not sure where the .5 is coming from.

Can you post a stack trace when you get a chance? Hopefully that'll point us in the right direction. :)

When I was logging these vote_ids to disk, I omitted a newline :(. So really vote_id is only a single vote_id of the form I indicated.

When I do ./run votes, here is the beginning of the output I get

Going to fetch 102 votes from congress #116.5 session 2020
h102-116.5.2020
h101-116.5.2020
h100-116.5.2020
h99-116.5.2020
h98-116.5.2020
h97-116.5.2020
h96-116.5.2020
h95-116.5.2020
h94-116.5.2020
h93-116.5.2020
h92-116.5.2020
h91-116.5.2020
h90-116.5.2020
h89-116.5.2020
h88-116.5.2020
h87-116.5.2020
h86-116.5.2020
h85-116.5.2020

And here is the stack trace which is received with the regex as it was

[h1-116.5.2020] Exception:

Traceback (most recent call last):

File "/home/gwerner/from_greg/congress/tasks/utils.py", line 182, in process_set
results = fetch_func(id, options, *extra_args)

File "/home/gwerner/from_greg/congress/tasks/vote_info.py", line 15, in fetch_vote
vote_chamber, vote_number, vote_congress, vote_session_year = utils.split_vote_id(vote_id)

File "/home/gwerner/from_greg/congress/tasks/utils.py", line 156, in split_vote_id
return re.match("^(h|s)(\d+)-(\d+).(\d\d\d\d|[0-9A-Z])$", vote_id).groups()

AttributeError: 'NoneType' object has no attribute 'groups'

If I go to an online python regex validator, obviously there will be no matches for the vote_ids which I have supplied.

I'm going to go out on a limb here and say that you are somehow running this with Python 3 or non-standard Python 2 command-line arguments? "116.5" looks like 116-and-a-half which suggests some Python 3 division is happening.

Yes, I see where the division is happening in utils.py

def congress_from_legislative_year(year):
return ((year + 1) / 2) - 894

Of course in python 3 that would be // instead of /