In Britain there has been a long tradition of 'leader's speeches' which were traditionally held at their party conferences. They can tell us a lot about the priorities and the focus of a politician, and can say something about the spirit of the time. Now with new digital methods emerging it is increasingly possible to linguistically map these speeches and search for the specific fingerprints of the political actors. SAPS is an exploratory research project focussing on just that, extracting and analysing these political speeches from British party leaders. In this study we will mostly look the speeches as a product of a party, however we will also focus on Winston Churchill in specific as a case study. Going forward, we will use data from two different archives, the Britishpoliticalspeech.org archive and the International Churchill Society. The main techniques which we will be using in this project are scraping (to extract data), exploratory analysis (to get a better idea about what the data), and a stylometric analysis (to enrich the data). The whole process we have documented in a Juypiter Books format which can be found using this link. The Juypiter Books was created using the repository found here.
This file constitutes a guide of the repository containing all the relevant information for the existing holdings of the Project.
The first main folder titled 'Analysis' contains two Juypiter Notebooks. The first consists of an exploratory analysis including some visualizations based on the python libraries pandas, seaborn and mathplotlib. The second Notebook contains the main stylometric Analysis (based on the Programming Historian tutorial from François Dominic Laramée (2018)) as applied to the speeches from the political parties and to our case study Winston Churchill. The first Notebook makes use of the 'metadata.csv' file and the second makes use of the TXT files. All these files are provided in the third folder of this repository.
The second main folder titled ‘Metadata and Texts’ contains the data on which the analysis is based. It contains a zipped (compressed) file with the whole set of texts and a CSV file with all the metadata. Most TXT files and the CSV file have been scraped, using the scrapers from the fourth folder. However, the TXT files from Winston Churchill's speeches (speeches 355 to 364) have been manually added. The selected variables of the dataset are the following: 'id', which was generated to uniquely identify every tuple, the 'speaker', the full name of each politician, accompanied by the First, (optional) Middle and (a set of) Surname(s), 'party', indicating the MPs, Parliamentarians / Political groups / formations, 'location', the place where each speech took place, 'date', the exact day/month/year the speech was delivered, 'name speech', a generated title and lastly the 'speech'.
The third main folder, titled as ‘Scrapers’, contains anew Juypiter Notebooks which scraped the Britishpoliticalspeech archive for respectively speech's metadata (CSV scraper) and the texts themselves (most importantly the TXT scraper).
Finally, the repository contains a few more files, the ‘mynewbook’, which is an auto-generated file, a Creative Commons Attribution 4.0 International Public License and the Data Management Plan using the ‘Science Europe Template’.