Text corpus of the 2017 German federal election manifestos

This repository contains the 2017 German federal election manifestos of 24 parties participating in the 2017 general election. The anayses below are based on the six most popular parties (CDU/CSU, SPD, Bündnis 90/Die Grünen, DIE LINKE, AfD, FDP). This is because some of the smaller parties either have very short manifestos (which makes it difficult to scale these documents) or only published a fundamental program, not an election-specific manifesto.

The manifestos are loaded into R as a quanteda corpus. You can clone the repository to use the text corpus. You can also download the file corpus_ger_man_2017.Rdata to your machine and use the following command to import the texts as a quanteda corpus. The raw manifestos are available as PDF files in the folder manifestos-pdf.

## Load packages
library(quanteda)
library(ggplot2)
library(dpylr)

## Load corpus file
load("path/to/file/corpus_ger_man_2017.Rdata")

## Only select the six major parties for the analysis
parties_select <- c("CDU-CSU", "SPD", "AfD", "Gruene", "Linke", "FDP")
corpus_ger_man_2017 <- corpus_subset(corpus_ger_man_2017, party %in% parties_select)

## Get summary of documents in corpus
summary(corpus_ger_man_2017)

# Corpus consisting of 6 documents.
# 
#     Text Types Tokens Sentences year   party      type
#      AfD  5972  21272       742 2017     AfD manifesto
#  CDU-CSU  4974  23034      1291 2017 CDU-CSU manifesto
#      FDP  9039  43383      2039 2017     FDP manifesto
#   Gruene 13185  81525      4011 2017  Gruene manifesto
#    Linke 12308  77004      2885 2017   Linke manifesto
#      SPD  8396  43827      2411 2017     SPD manifesto
# 
# Source:  Party manifestos of 2017 German federal election
# Created: Mon Aug 28 23:03:20 2017
# Notes:   Corpus created by Stefan Müller (muellerstefan.net)

You can use the script load_and_explore_corpus.R to load the corpus into R, transform it to a document-feature matrix, get the most frequent words for each manfesto and to estimate Wordfish and Correspondence Analysis positions.

Plot the dispersion of the word "Gerechtigkeit" (justice, fairness) across the manifestos.

textplot_xray(kwic(corpus_ger_man_2017, "Gerechtigkeit"))

Interestingly, the CDU/CSU hardly mentions "Gerechtigkeit" only once while the word is missing entirely in the FDP manifesto. The Greens devote a large section of their manifesto to justice, in the manifestos by the Left party (DIE LINNKE) and the SPD the word is spread throughout the entire manifesto. The token index for all manifestos (x-axis) is rescaled and standardised from 0 to 1.

Plot the 15 most frequent words per manifesto (after removing stopwords).

## Define additonal German stopwords
stopwords_additional <- c("ab", "dass", "deshalb", "seit",
                          "statt", "n", "sowie")

## Transform to document feature matrix
dfm_man <- dfm(corpus_ger_man_2017, remove = c(stopwords("german"), stopwords_additional),
               tolower = TRUE, remove_punct = TRUE, remove_numbers = TRUE)

## Weight dfm by relative frequency
dfm_man_weight <- dfm_weight(dfm_man, type = "relfreq")

## Get most frequent words by party
freq <- textstat_frequency(dfm_man_weight, 
                           groups = docvars(corpus_ger_man_2017, "party"), 
                           n = 15)

freq_ordered <- freq[seq(dim(freq)[1],1),] # reorder rows for plotting

freq_ordered$order <- 1:nrow(freq_ordered)

## Plot most frequent words by party
ggplot(data = freq_ordered, aes(x = order, y = frequency)) +
  geom_point() +
  facet_wrap(~ group, scales = "free", nrow = 2) +
  coord_flip() +
  scale_x_continuous(breaks = freq_ordered$order, 
                     labels = freq_ordered$feature) +
  labs(x = NULL, y = "Relative frequency") + 
  theme_custom()

Estimate party positions with a Wordfish model and Correspondence Analysis.

## Select minimum frequency and occurence
dfm_man_trim <- dfm_trim(dfm_man, min_count = 2)

## Run Wordfish model
model_wordfish <- textmodel_wordfish(dfm_man_trim)

## Run correspondence analysis
model_ca <- textmodel_ca(dfm_man_trim)

## Plot models

## Create label
party_label <- paste(docvars(corpus_ger_man_2017, "party"), "2017", sep = " ")

textplot_scale1d(model_wordfish, margin = "documents", doclabels = party_label) +
  labs(title = "Wordfish estimates")

textplot_scale1d(model_ca, doclabels = party_label) +
  labs(title = "Correspondence analysis")

To cite the corpus in publications, please use the following:

  Müller, Stefan. 2017. ger-man-2017: Text corpus of the 2017 German federal election 
  manifestos. Version 1.0: http://github.com/stefan-mueller/ger-man-2017.

  @Manual{,
    title = {ger-man-2017: Text corpus of the 2017 German federal election},
    author = {Stefan Müller},
    note = {Version 1.0},
    url = {http://github.com/stefan-mueller/ger-man-2017},
  }