/govhack-24-parlipulse

GovHack 2024 Entry - Parli Pulse

Primary LanguageClojureApache License 2.0Apache-2.0

Parli Pulse

This GovHack project uses AI to analyse political debates in the House of Reps, highlighting constructive and divisive moments. By promoting more data around trends in parliamentary debates, I aim to improve democracy and showcase responsible AI use.

By providing this analysis, the project aims to promote healthier debate practices and ultimately enhance democracy by fostering a more collaborative and productive political environment.

It also aims to raise the level of public discourse around usage of AI by providing all the data and details of how the data is used and combined with LLM prompts to produce an interesting, and hopefully constructive result.

Goto the interactive dashboard

ParliPulse logo

ParliPulse dashboard

Transcript

In the heart of Australian democracy, a new pulse is beating. It's the pulse of data, analysis, and a vision for a healthier political landscape. This is Parli Pulse.

My name is Felix Barbalet, and I’ve been working with open government data since 2010. For this year’s GovHack competition, I’ve built Parli Pulse which harnesses the power of AI to analyse parliamentary debates in the House of Reps.

It identifies moments of constructive collaboration and highlights instances of divisive rhetoric.

The goal? To promote data-driven insights into parliamentary trends, fostering a more informed and engaged democracy.

Inspired by the Department of Home Affairs' call for a more resilient democracy, and Infosys AI in Governance challenge, I set out to explore how AI could help Australians "Disagree Better."

I believe that open, respectful debate is crucial for a healthy democracy. Parli Pulse aims to shine a light on those moments, encouraging a more collaborative and productive political environment.

In addition to utilising AI, I’ve made the underlying data more accessible and open for anyone to reuse and analyse themselves.

See how AI classifies speeches, track trends over time, and discover the moments where politicians find common ground.

As part of this project, I’ve made the method used to build all this open and accessible, ensuring transparency and fostering trust in AI-driven tools.

Parli Pulse isn't just about analysing the past; it's about shaping the future. Imagine a world where political discourse is informed by data, where citizens are empowered to hold their representatives accountable, and where collaboration triumphs over division. This is the future we envision. Join us on this journey.

Motivation

It took me until Saturday lunch time to settle on a project, and I was largely inspired by the work from Department of Home Affairs Strengthening Australian democracy A practical agenda for democratic resilience Specifically, some of the mentors explained how community connectedness is important and in Australia we have this incredible asset of a democratic society, which (like all assets) should be looked after.

Reading through the report, I came across the example of Disagree Better

Disagree Better –Amid intense political polarisation in the United States, the National Governors Association (NGA) is encouraging governors across the country to reduce partisan animosity and ‘disagree better’ by fostering respectful debate and modelling positive ways of working through policy problems. Building on the promising effects of a video which featured two opposing governor candidates advocating for bipartisanship and pro-democratic norms, the NGA has designed a toolkit of customisable public-facing interventions such as organising ‘service projects’ to bring communities together through volunteerism, recording an ad or writing an op-ed with someone from another party, and hosting debates at colleges and universities that model healthy conflict.

That made me think about whether AI could be a useful tool in helping us Disagree Better in Australia. And given the importance of political discourse in our democracy, using AI to analyse parliamentary discourse seemed like the logical next step.

While of course what happens in Parliament is political, this project is not meant to be a debate about one party versus another, but rather about how we could all learn to get along better and set examples that are meaningful to improve outcomes for all Australians.

In putting together the interactive dashboards, I have avoided having a us VS them view on anything. The dashboard allows anyone to explore the dimensions of the data, and see (and understand) how AI has classified the discourse, and leaves everyone to draw their own conclusions!

Challenge Entries

Practical Ideas for Community Rollout

  1. "Parli Pulse: Community Edition"
  • Initial Design: A simplified, user-friendly version of the Parli Pulse dashboard tailored for community use. It would highlight key trends in parliamentary debates, focusing on instances of constructive disagreement, collaboration, and examples where politicians bridged partisan divides.
  • Data Analysis and Evidence: Initial data analysis from the Parli Pulse project can identify specific debates or moments where positive interactions occurred. These can serve as evidence to support the idea that respectful discourse is possible even in a highly charged political environment.
  • Engagement Across Ages and Communities:
  • Schools: Integrate simplified versions of the dashboard into civics education, encouraging students to analyze real-world examples of political discourse.
  • Community Groups: Organize workshops or discussions around the data, facilitating dialogue on the importance of respectful disagreement and collaboration.
  • Online Platforms: Create social media campaigns or interactive quizzes to engage a wider audience and spark conversations about the project.
  1. "Disagree Better" Community Challenges
  • Initial Design: Inspired by the NGA's "Disagree Better" campaign, launch community challenges encouraging individuals to engage in respectful conversations with those who hold opposing views. Parli Pulse data can provide examples of successful bipartisan collaboration to inspire participants.
  • Data Analysis and Evidence: Analyze data to identify common ground found in parliamentary debates, even on contentious issues. This can serve as evidence that finding common ground is achievable, even in challenging situations.
  • Engagement Across Ages and Communities:
  • Schools: Organize debates or discussions where students must find common ground on a given issue, drawing inspiration from Parli Pulse data.
  • Community Groups: Facilitate structured conversations where participants with differing opinions can practice respectful disagreement and seek common ground.
  • Online Platforms: Create online forums or platforms where individuals can engage in moderated discussions on various topics, promoting respectful dialogue.

Data Sets Needed

Federal Parliament - House of Reps - Official Hansard: Continue using this dataset to expand the Parli Pulse analysis and identify more instances of positive political interactions. Social Media Data: Analyze social media conversations related to political debates to gauge public sentiment and identify areas where respectful discourse is lacking. Community Survey Data: Collect data through surveys or focus groups to understand community perceptions of political discourse and identify barriers to respectful disagreement.

Measuring Impact

  • Trust in Institutions: Track changes in public opinion surveys regarding trust in parliament and politicians.
  • Social Cohesion: Monitor social media sentiment analysis and conduct surveys to assess changes in community perceptions of social cohesion and willingness to engage with those holding different views.
  • Sense of Belonging: Use surveys and focus groups to measure individuals' sense of belonging within their communities and their willingness to participate in civic life.
  • Civic Awareness: Track increases in online engagement with Parli Pulse data and related educational materials.
  • Civic Participation: Monitor changes in voter turnout, attendance at community meetings, and participation in online discussions related to political issues.
  • Community Connections: Measure changes in community group membership, participation in volunteer activities, and self-reported feelings of connectedness to the community.
  1. Boosting Operational Efficiency:
  • Automating Analysis: Parli Pulse automates the analysis of parliamentary transcripts, a task that would be extremely time-consuming if done manually. This frees up government staff to focus on other priorities, improving overall efficiency.
  1. Improving Transparency:
  • Real-Time Visibility: The interactive dashboard provides real-time insights into parliamentary discourse, allowing citizens to see how their representatives are engaging in debates. This transparency can build trust and accountability.
  • Open Data and Methodology: Parli Pulse's commitment to open data and transparency about its AI methodology fosters a deeper understanding of how the analysis is conducted, further increasing transparency.
  1. Ensuring Ethical Use:
  • Focus on Constructive Discourse: Parli Pulse prioritizes identifying instances of constructive disagreement and collaboration, promoting a positive and ethical use of AI in the political sphere.
  • Transparency and Explainability: By making its data and methodology open, Parli Pulse allows for scrutiny and ensures that the AI is used ethically and without bias.
  1. Data Privacy and Security:
  • Parli Pulse works with publicly available transcripts, ensuring that no personally identifiable information is used.
  • Anonymous access : No login information is required to access the interactive dashboards
  1. Building Public Trust:
  • Openness and Education: The project's emphasis on transparency and providing information on AI usage helps educate the public and build trust in AI-driven tools.
  • Focus on Positive Outcomes: Highlighting constructive political interactions can foster a more positive perception of politics and encourage trust in democratic institutions.
  1. Future Adaptations:
  • Sentiment Analysis: Future iterations of Parli Pulse could incorporate more nuanced sentiment analysis to capture the emotional tone of debates, providing even deeper insights.
  • Predictive Modeling: Advanced AI models could be used to predict potential areas of bipartisan collaboration or identify emerging issues that require attention.

Comprehensive Plan for Implementation

  • Expand Data Analysis: Analyze more parliamentary transcripts to build a more comprehensive picture of political discourse over time.

  • Refine AI Models: Continuously improve the AI models to ensure accuracy and address any potential biases.

  • Develop User-Friendly Interfaces: Create intuitive and accessible dashboards and visualizations to make the data easily understandable for a wider audience.

  • Engage with Stakeholders: Collaborate with government agencies, educational institutions, and community groups to promote the use of Parli Pulse and encourage informed discussions about political discourse.

  • Educate the Public: Develop educational resources and outreach programs to explain the project's AI methodology and foster understanding of responsible AI use. Concrete Examples

  • Identify Bipartisan Opportunities: Parli Pulse could highlight specific instances where politicians from opposing parties found common ground on an issue, demonstrating the potential for collaboration.

  • Track Trends Over Time: Visualize changes in the level of constructive discourse over time, helping identify factors that contribute to positive or negative trends.

  • Educate Future Leaders: Integrate Parli Pulse into civics education programs, empowering students to analyze political discourse and engage in informed debates. Presentation and Prototype

Data Sources

Description: The Federal Government Hansard is the official, edited transcript of the proceedings of the Australian Parliament. It records debates, speeches, questions, and other parliamentary business

Note: I have focussed on just the House of Reps due to time constraints

Usage: I wrote a small program that retrieves the House of Reps Hansards since 2012 in XML format and then parses those XML Documents into a JSON file with the following fields :-

{
  "session.no": "1",
  "info-type": "PRIVILEGE",
  "date": "2018-11-29",
  "speaker-name": "Morton, Ben, MP",
  "speaker-electorate": "Tangney",
  "chamber": "House of Reps",
  "page.no": "0",
  "proof": "0",
  "parliament.no": "45",
  "speaker-party": "LP",
  "info-title": "PRIVILEGE",
  "period.no": "7",
  "text": "..."
}

The dataset I created for GovHack covers the following years :-

Year Record Count
2024 7730
2023 11996
2022 7746
2021 11866
2020 10318
2019 8166
2018 11128
2017 11400
2016 8982
2015 13500
2014 12664
2013 7884
2012 11066

The types of Speech included over that timeframe :-

info_title record_count
BILLS 50238
STATEMENTS BY MEMBERS 26150
MATTERS OF PUBLIC IMPORTANCE 12056
ADJOURNMENT 9752
COMMITTEES 7684
MOTIONS 5608
PRIVATE MEMBERS' BUSINESS 4430
BUSINESS 2674
CONDOLENCES 2206
DOCUMENTS 1792
DISTINGUISHED VISITORS 1556
STATEMENTS ON INDULGENCE 1392
MINISTERIAL STATEMENTS 1018
PERSONAL EXPLANATIONS 870
PETITIONS 838
GOVERNOR-GENERAL'S SPEECH 820
AUDITOR-GENERAL'S REPORTS 560
STATEMENTS 550
STATEMENT BY THE SPEAKER 546
PARLIAMENTARY REPRESENTATION 448
MINISTERIAL ARRANGEMENTS 438
PARLIAMENTARY OFFICE HOLDERS 402
RESOLUTIONS OF THE SENATE 324
QUESTIONS WITHOUT NOTICE 296
QUESTIONS WITHOUT NOTICE: ADDITIONAL ANSWERS 272
DELEGATION REPORTS 270
DEATH OF HER MAJESTY QUEEN ELIZABETH II AND ACCESSION OF HIS MAJESTY KING CHARLES III 226
PRIVILEGE 146
STATEMENTS ON SIGNIFICANT MATTERS 138
REGULATIONS AND DETERMINATIONS 134
MINISTRY 132
QUESTIONS TO THE SPEAKER 120
REGISTER OF MEMBERS' INTERESTS 52
TARIFF PROPOSALS 50
PARTY OFFICE HOLDERS 46
SHADOW MINISTERIAL ARRANGEMENTS 42
NOTICES 38
PARLIAMENTARY ZONE 32
BUDGET 22
SHADOW MINISTRY 10
ADDRESS BY THE PRIME MINISTER OF THE UNITED KINGDOM 8
ADDRESS BY THE PRESIDENT OF THE PEOPLE'S REPUBLIC OF CHINA 8
ADDRESS BY THE PRIME MINISTER OF THE REPUBLIC OF INDIA 8
PRIME MINISTER OF PAPUA NEW GUINEA 6
PRESIDENT OF THE REPUBLIC OF THE PHILIPPINES 6
ADDRESS BY THE PRESIDENT OF THE REPUBLIC OF INDONESIA 6
ADDRESS BY THE PRIME MINISTER OF JAPAN 6
PRESIDENT OF UKRAINE 6
PARLIAMENTARY RETIRING ALLOWANCES TRUST 6
ADDRESS BY THE PRIME MINISTER OF SINGAPORE 6
STATEMENTS BY THE SPEAKER 2

A big motivation for me in participating in GovHack is to make public datasets more accessable, so as part of this challenge I am providing a full copy of the JSON version of the Hansard dataset for over 10 years.

This means everyone has a dataset that is far more usable than PDF or XML of individual sittings for several reasons:

  1. Ease of Analysis and Processing:
  • Structured Data: JSON is a structured data format, making it easy to parse and analyze with programming languages and data analysis tools. This allows for efficient searching, filtering, and extraction of specific information. PDFs and XML, while structured in their own ways, require more complex parsing and extraction techniques.
  • Machine Readability: JSON is inherently machine-readable, allowing for automated analysis and integration with other datasets or applications. This enables researchers, journalists, and developers to extract insights and patterns from the Hansard data without manual intervention. PDFs and XML, while machine-readable to some extent, require more complex processing for machine understanding.
  1. Data Integration and Interoperability:
  • Standard Format: JSON is a widely adopted standard for data exchange, making it easily compatible with various systems and applications. This allows for seamless integration of the Hansard data with other datasets or tools for comprehensive analysis and research. PDFs and XML, while standard formats, require more complex transformations for integration with other datasets or tools.
  • API Friendliness: JSON is commonly used in APIs, making it easy to access and interact with the Hansard data programmatically. This enables developers to build applications and services that leverage the Hansard data for various purposes. PDFs and XML, while accessible through APIs, require more complex handling for programmatic interaction.
  1. Efficiency and Scalability:
  • Compact Size: JSON is a relatively compact data format, making it efficient to store and transmit large volumes of data. This is crucial for a dataset spanning over 10 years of Hansard transcripts, which would be significantly larger in PDF or XML format. Database Compatibility: JSON is natively supported by many modern databases, making it easy to store and manage the Hansard data for efficient querying and retrieval. PDFs and XML, while storable in databases, require more complex handling and indexing for efficient querying.
  1. Accessibility and Openness:
  • Text-Based: JSON is a text-based format, making it accessible to a wide range of users and tools. This promotes openness and transparency, enabling more people to access and analyze the Hansard data for various purposes. PDFs and XML, while accessible, require specific software or tools for viewing and analysis.

In summary: A JSON version of the Hansard dataset for over 10 years offers significant advantages in terms of ease of analysis, data integration, efficiency, and accessibility. It enables researchers, journalists, developers, and the public to leverage the vast amount of parliamentary information for various purposes, promoting transparency, accountability, and informed decision-making.

You can download the processed Hansard dataset here

You can also access the individual XML files here

Using AI to enrich the dataset

After getting the data into a format that was more usable for reuse and "Hacking", my next step was to figure out how to use a Large Language Model to enrich the unstructured dataset - politicians speeches in parliament.

I decided to use Google's platform and the Gemini Pro 1.5 model and given the time constraints, I needed to analyse thousands of free text speeches as quickly as possible.

I started by uploading my dataset into BigQuery, and then following the instructions to connect BigQuery with Gemini Pro 1.5

If I had more time, I would have explored other available models to ascertain whether a smaller (cheaper) model would provide comparable results.

Once I had BigQuery and Gemini talking, the next step was to iterate through a series of LLM Prompts to find the right instructions for the model to classify the data.

As I was working in the GovHack Hackerspace in Canberra while working on this element of my entry, I was able to crowd source improvements to the prompt over an hour or two. Thanks to those mentors who helped!

Here's an example of testing a prompt on 10 random speeches.

CREATE TABLE `govhack-24-parlipulse.federal_hansard.first_output` as
SELECT *
FROM ML.GENERATE_TEXT(
  MODEL `govhack-24-parlipulse.federal_hansard.g15pro`,
      (
      SELECT CONCAT(
        """You are a analyst at a think tank that studies deliberative Democracy and strengthening global governance. 
        Analyze the following transcript of a political speech in parliament and classify its overall tone and content on a scale of 1 to 5 (RATING) where:

* **1** represents a **very polite, constructive, and collaborative** speech.
* **5** represents a **very impolite, divisive, and unconstructive** speech.

Consider the following factors in your analysis:

* **Language and Tone:** Assess the use of respectful language, the presence of personal attacks or inflammatory rhetoric, and the overall tone of the speech (e.g., conciliatory vs. antagonistic).
* **Focus on Issues:** Evaluate whether the speech primarily focuses on addressing policy issues and presenting solutions or if it's centered on criticizing opponents and creating division.
* **Collaboration and Compromise:** Determine if the speech demonstrates a willingness to collaborate with others, find common ground, and seek compromise, or if it adopts a rigid and uncompromising stance.
* **Respect for Others:** Consider whether the speech shows respect for differing viewpoints and acknowledges the contributions of others, or if it dismisses opposing perspectives and seeks to undermine them.

Provide specific examples from the transcript to support your classification in the REASONING field. 
Summarize the key subjects or issues being discussed, in the SUBJECTS field 
Identify the speakers primary position on the key subjects, whether they Supports or Opposes in the POSITION field.

Your response should be a JSON object without backticks or anything else, in the following form

{
 "RATING":YOUR_RATING, #Number between 1 and 5
 "SUBJECTS":SUBJECTS_DISCUSSED, #Top 3, each in no more than one to two words, prioritising specific policy proposals if relevant (e.g. Climate; Taxation; Migration)
 "POSITION":SPEAKERS_POSITION_ON_SUBJECT, #One of the following, with no other content: "Supports; Opposes; Other"
 "REASONING":REASONING #Free text, no more than 200 words
 }

Input transcript:
"""
      , text) AS prompt, speaker_party, speaker_electorate, info_title, date, speaker_name,
 CHAR_LENGTH(text) AS character_count

      FROM federal_hansard.import
      WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 MONTH)
and info_title IN ('BILLS', 'STATEMENTS BY MEMBERS', 'MATTERS OF PUBLIC IMPORTANCE', 'MOTIONS', "PRIVATE MEMBERS' BUSINESS", "QUESTIONS WITHOUT NOTICE: ADDITIONAL ANSWERS")

    ),
  
  STRUCT(4096 AS max_output_tokens, 0 AS temperature,
  0.95 AS top_p, true AS flatten_json_output,
  false AS ground_with_google_search)
);

The prompt above is the last prompt - the one that I used to produce the output dataset below and that drives the interactive dashboard.

My initial attempt to run this prompt over all the data timed out after 6 hours..., so instead I decided to run it over the last 12 months (to 7 Sept 2024) resulting in over 4,000 enriched records.

This took almost 4 hours to run (I did not have any time to look at optimising this part of my analysis unfortunately).

You can download this dataset here or explore the data using an interactive dashboard here

I would love to extend this analysis further in the future.

Use of AI

Note: I have extensively used AI to help me draft my code, SQL queries, LLM Prompts and free text (such as this page).

All code in this repository is available under Apache License 2.0 and all data downloadable and text in this repo that is not code is CC BY-SA 4.0