With the German federal election (Bundestagswahl) approaching in September 2021, INWT-Statistics aims to build a comprehensive model that forecasts the Bundestagswahl results.
This repository provides the code of our election forecast predictions published on www.wer-gewinnt-die-wahl.de using a dynamic multilevel Bayesian model written in R and Stan.
Improving on our 2017 Bundestagswahl forecasting, our model this year uses a Long-short term memory state-space model for election forecasting (Groß, 2019). The model simulates election outcomes by using poll, election and government data, applies a long-short-term voter memory state-space model and accounts for different sources of uncertainties.
For press coverage of our 2017 forecast, please refer to these articles on Die Welt, the New York Times, and Sueddeutsche.
State-space models are common choices for modelling voting intentions using poll data. For the 2021 Bundestagswahl, we are going beyond the random-walk approaches by introducing a long-short term event memory effect approach. Because vote shares tend to reverse back to the party’s long-term trend after larger short-term movements, we hypothesize that events influencing the vote share can be decomposed into:
- a short-term effect due to media information spreading,
- a smaller remaining long-term effect e.g. new events and ‘forgetfulness’.
We also categorized sources of uncertainty in forecasting vote share into two types:
- uncertainty about the future events, i.e. shocks to vote share
- uncertainty in polling:
- common bias of all pollsters for a specific party
- house bias of a specific pollster for a specific party
- polling uncertainty of a specific pollster
Our model
- outperforms poll averages by around 15 %,
- delivers an accurate statement about the uncertainty of our forecasts,
- extends forecasting to events that describe the likelihood of government coalitions or government participation of individual parties.
A detailed description of model specifications and performance can be found here, a presentation held at the BerlinBayes Meetup can be found here. For quick reference and overview, check out this poster presented at the Stan 2019 conference, where we first introduced our model. Please note that the poster refers to an older version of the model. Since then, we have added a few more features.
Our model uses three different types of input data:
-
Polling data: the data amounts to more than 4,000 polls from eight different pollsters between November 1st, 1994, through the current date for the German federal election ("Bundestagswahl"). We scrape this data from www.wahlrecht.de, which collects all available polling data and is frequently updated.
-
Election outcome data: the model considers data election results and government / opposition status of the six large parties for all German federal elections since 1998. For a given election result usually multiple coalitions are conceivable and the government is formed independent from the voters.
-
Expert interviews: the model also makes use of expert interviews, that reflect expert opinions regarding coalition preferences of the parties. Experts are defined as political scientists or people working for a party / a party affiliated foundation. A list of potential coalitions was given to the experts, then the experts ranked these coalitions under the premise of independence from potential election results or actual polling. Up to date, the model considers 16 interview responses with 12 rankings. This data also provides critical priors for our Bayesian workflow. For more detailed methodology, please read the methods section.
dataDE <- loadDataDE(predDate)
# `returns` list of poll data, elections data, and coalition data
# `predDate`:: the date of running the model e.g. as.Date("yyyy-mm-dd")
dataPrep <- preparePollData(dataDE$pollData, dataDE$Elections, predDate)
# `returns` a list of cleaned data sets ready for modeling
# `dataDE$pollData` :: contains the imported polls data from the directory
# `dataDE$Elections` :: contains the historical German elections results data
# `predDate` :: date of running the model
modelResults <- compileRunModel(dataPrep$modelData)
# `returns` a list of stan models after sampling
# `dataPrep$modelData` :: formatted cleaned data input resulted from previous `preparePollData` function
plotForecast <- plotElectionData(modelResults, dataPrep, predDate, dataDE$pollData, start = "2016#01#01")
# `Returns` list of _ggplot_ graph, and JSON file output
# `modelResults` :: a list of model output of the function _compileRunModel()_
# `dataPrep` :: output of the function _preparePollData()_
# `predDate` :: date of running the model
# `dataDE$pollData` :: contains the imported polls data from the directory
# `start` :: date format "yyyy-mm-dd"
fact_forecast <- getForecastTable(modelResults, dataPrep, predDate)
# `Returns` a table that contains each political party prediction forecasts
# `modelResults`: a list of model output of the function _compileRunModel()_
# `dataPrep`:: list of cleaned data sets ready for modeling from _preparePollData()_
# `predDate`:: date of running the model, as.Date("yyyy-mm-dd")
fact_event_prob <- eventsDE(modelResults, dataPrep, predDate)
# `Returns` a dataframe of events taking in consideration election forecasts
# `modelResults`: a list of model output of the function _compileRunModel()_
# `dataPrep`:: list of cleaned data sets ready for modeling from _preparePollData()_
# `predDate`:: date of running the model, as.Date("yyyy-mm-dd")
fact_coalition_prob <- koalitionDE(dataDE$Koalitionen, modelResults, dataPrep, predDate)
# `Returns` dataframe of political party coalitions possible estimates
# `dataDE$Koalitionen`:: dataframe from function _loadDataDE(predDate)_
# `modelResults`:: a list of model output of the function _compileRunModel()_
# `dataPrep`:: list of cleaned data sets ready for modeling from _preparePollData()_
# `predDate`:: date of running the model, as.Date("yyyy-mm-dd")
fact_part_of_government <- partOfGovernmentDE(fact_coalition_prob, predDate)
# `Returns` dataframe of sum estimates of the different possibilities of coalitions
# `fact_coalition_prob` :: dataframe that contains party coalition estimates
# `predDate` :: date of running the model, as.Date("yyyy-mm-dd")
A more detailed tutorial on how to run the model can be found here.