tracing media coverage of the hanau attack
I plug search terms into the news websites search interfaces that identify the articles mentioning each attack fairly accurately. to achieve this, I combine a location/event-specific prefix with a suffix that indicates an attack. articles have to contain both parts. the suffixes are "attentat", "terror", and "anschlag". the prefies are attach specific:
Attack | prefix 1 | prefix 2 | prefix 3 |
---|---|---|---|
Hanau | hanau | kesselstadt | tobias r***** |
Berlin | breitscheidplatz | berliner weihnachtsmarkt | anis a*** |
Ansbach | ansbach | fränkisches musikfestival | mohammed d***** |
Würzburg | würzburg regionalbahn | Winterhausen | riaz khan a***** |
(attacker's names are only anonymized in this table. #SayTheirNames is for the victims only.)
i plug in all possible combinations of these terms with the suffixes into the search engines at www.spiegel.de/suche, www.faz.net/suche/, www.sueddeutsche.de/news, and bild.de/suche.bild.html and save the returned articles. of course, several articles come up for multiple searches, these duplicates are removed and only counted as one.
false positives: one strong indication that they do a good job at finding only relevant articles, is that only very few many hits for each of these terms are articles from before the attack took place. if the search terms were so broad that they identify random texts on different topics we should expect them to also return a number of articles from before the attack took place. they do not (as can easily be checked here: https://simonheb.shinyapps.io/hanau-media/).
false negatives: if you are aware of any article in the respective journals that deal with the attacks but are not listed in the data on https://simonheb.shinyapps.io/hanau-media/, please let me know. this would be useful to refine the searches.
because they have shitty search interfaces on their websites, which do not allow to search for older articles
i am happy to do that. i am even more happy if you do that. this project is open source. i would just suggest that, if you add something to it indepently from me, to please consider also pushing your changes to this repository so that we can all benefit from it results.
the different search interfaces require different ways of finetuning. this section provides a short overview of these measures.
spiegel allows for some filtering. i restrict the search to anything that is published in "DER SPIEGEL" or "SPIEGEL+", thus effectively excluding "SPIEGEL-PRINT", "SPIEGEL International", etc. bento.de was discontinued and merged into spiegel, this has some bearing on the results here as bento.de ran a series of interviews with the hanau victims family that significantly alter the curves around august 2020, which would be lower without the bento.de articles.
sz allows filtering results. i restrict the search to "articles" only (no videos, galeries, or other links) and only to those that can be attributed to "sueddeutsche.de", thus effectively excluding dpa news. sz also has a slighly different format for search queries. to ensure matching results I thus replace spaces with " AND ". the search term "hanau anschlag" thus becomes "hanau AND anschlag". also, only for the hanau attack, it seems that sz had a policy of not spelling out the attackers full lastname, which is why replace that with an abbreviated "tobias r" in all of the search queries.
sz allows filtering results. i restrict the search to "articles" only (no videos, galeries, etc).
does not allow for subcategories in their search interface, returned results include dpa/agency texts, eilmeldungen and others
pseudo apis.R
contains routines to access websearches of news websites.1 - download search results.R
downloads search results from all news websites.2 - clean data.R
cleans the search results to prepare them to be used in the app.app.R
is the shiny app.*.RDS
store the latest data files after each step in the data pipeline.