tmccarthy/ausvotes

Data driven exploration results

Opened this issue · 2 comments

Hi, Tim - thanks for this excellent example of using public data!

It inspired me to ask whether there's merit in a purely data driven approach to find and count the unique preference strings for each state - these can then be compared to the HTVs but will include non-HTV patterns such as donkey voting.

Trivial, fugly code and some findings at https://github.com/fubar2/aus_senate
I found '/' and '*' in the csv preference data - any idea what they are supposed to be? I just converted them to '0' to ignore...

For example, in the NT data, the top 6 duplicated patterns and their counts are:

==> NT_table.tab <==
Preferences     Count
6,4,0,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0     6209
0,3,6,2,5,1,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0     6021
0,3,5,1,4,2,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0     1284
1,2,3,4,5,6,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0     678
1,2,3,4,5,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0     538
6,4,5,0,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0     518

First and last look like the Country Liberal ticket but with 518 voters swapping their 5th preference from the greens to the citizens electoral council - could be green-o-phobia or perhaps transcription error since the HTV does not show box C.....

Good stuff - happy to talk if you are interested....

Hi Ross 👋!

I absolutely agree that there's merit to that approach. In terms of donkey votes, I've actually previously done some analysis on those in 2016. I put together a site at tmccarthy.github.io/SenateDB with a section on donkey votes.

Your insight about voters swapping their preferences is an interesting one! I hadn't considered that at all. May be interesting to see not just the rates at which HTV cards are used, but also apparent transcription errors and minor deviations. Something to look into!

As for the / and * characters, I got in touch with the AEC a few years ago and learned that they represent ticks (/) and crosses (*). These are considered formal as part of the savings provisions in section 269 of the Electoral Act, and are equivalent to marking a square with a 1. I ignored these for the purposes of matching how-to-vote cards.

Thanks for reaching out! I'm certainly intrigued by the question of how often people make mistakes transcribing how-to-vote cards. My current focus is generalising what I have written to be used for NSW Legislative Council results, which are also available online. Maybe you can take a look at those too.

Hi, Tim.
Thanks for the response.
Will fix my code so / and * become 1 - thanks for the insight from the AEC.

Was thinking about estimating edit distances to look for common patterns requiring no more than (e.g.) one transposition to match a more common pattern as a way of estimating "close, but no cigar" matches to HTV like that NT CLP case I mentioned anove.

Impossible to distinguish transcription errors from recalcitrance reliably since we can't get into the voter's head, but it might be amusing :)

Will take a look at the NSW LC data too.

(update a few hours later...)

I decided to see how many of the top patterns have only one box different or an adjacent box transposition. Turns out that the top pattern has a few likely meatware errors if you allow one transposition or a Hamming distance of 1.

For the NT (see below) that means the CLP HTV count could probably be 6208 + 517 + 346 ?

### Transposition of positions 2 and 3
 #0 = 6,4,0,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 #5 = 6,4,5,0,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
### Hamming=1 difference at position 2
 #0 = 6,4,0,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 #8 = 6,4,7,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
### Transposition of positions 1 and 2
 #0 = 6,4,0,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 #16 = 6,0,4,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
### Hamming=1 difference at position 6
 #3 = 1,2,3,4,5,6,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 #4 = 1,2,3,4,5,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
### Transposition of positions 4 and 5
 #6 = 0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 #7 = 0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
### Transposition of positions 0 and 1
 #9 = 0,6,4,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 #16 = 6,0,4,5,1,3,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
### Hamming=1 difference at position 0
 #11 = 7,6,5,4,1,2,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 #14 = 0,6,5,4,1,2,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0