Tools to pull and transform the data from the oireachtas API
Cron job runs deploy/nightly each week night at 23:30. This script:
- Pulls all votes for a time period -
startDate
,endDate
(format yyyy-mm-dd) - Loops over the results and uses the
contextDate
field to pull the full xml file for that day's activity. Currently, 31st August 2020, the API is broken for pulling XML for individual debates. The URL to pull the full day's data ishttps://data.oireachtas.ie/akn/ie/debateRecord/dail/${contextDate}/debate/mul@/main.xml
- This XML file contains all of the data for the day - speeches, votes, results and all related data. It's relatively large, 2020-07-30 is 1MB, and parsing it is not straightforward
Near the top of the XML data there is -
<analysis source="#source">
<parliamentary>
<voting eId="vote_37" href="#sum_34" outcome="#lost" refersTo="#sum_33">
<count eId="vote_37-count_1" href="#qty_1" refersTo="#ta" value="52"/>
<count eId="vote_37-count_2" href="#qty_2" refersTo="#nil" value="67"/>
<count eId="vote_37-count_3" href="#qty_3" refersTo="#staon" value="0"/>
</voting>
<voting eId="vote_38" href="#sum_42" outcome="#lost" refersTo="#sum_41">
<count eId="vote_38-count_1" href="#qty_4" refersTo="#ta" value="57"/>
<count eId="vote_38-count_2" href="#qty_5" refersTo="#nil" value="72"/>
<count eId="vote_38-count_3" href="#qty_6" refersTo="#staon" value="0"/>
</voting>
<voting eId="vote_39" href="#sum_49" outcome="#lost" refersTo="#sum_48">
<count eId="vote_39-count_1" href="#qty_7" refersTo="#ta" value="58"/>
<count eId="vote_39-count_2" href="#qty_8" refersTo="#nil" value="77"/>
<count eId="vote_39-count_3" href="#qty_9" refersTo="#staon" value="0"/>
</voting>
This looks like a summary of all the votes for that day and I can use the href and refersTo attributes to find all the appopriate info.
For example following the refersTo
for the first vote above refersTo="#sum_33"
I can get the vote details - outcome and who voted.
This section starts with:
<debateSection name="division" refersTo="#bill.2020.17.dail." eId="dbsect_24">
<summary eId="sum_33">Amendment put: </summary>
<summary title="division" eId="sum_34">The Committee divided: Tá,
<quantity refersTo="#ta" normalized="52" eId="qty_1">52</quanti
The refersTo
here, bill.2020.17.dail.
, can be used to find the debate.
<debateSection name="debate" refersTo="#bill.2020.17.dail.3_sub" eId="dbsect_23">
<heading>Residential Tenancies and Valuation Bill 2020: Committee and Remaining Stages
<recordedTime time="2020-07-30T15:20:00+01:00"/>
</heading>
<summary eId="sum_31"> Sections 1 and 2 agreed to.</summary>
This contains the full text of the debate and other details.
Archive folder contains initial, offline data transform and site scraping scripts.