/namarsh-dataset

Supplementary Namarsh material as an example for data wrangling and processing

Primary LanguageJupyter NotebookMIT LicenseMIT

Namarsh dataset supplementary materials and data

Supplementary Namarsh material as an example for data wrangling and processing in Python. Namarsh was a website that provided reports on contentious events. It seized to exist in 2022. Most of the materials were backed up by Archive.org. This simple script allows to work with the .html files backed up by their Wayback Machine.

Use

This code can be modified, published, used or reused. The code was used when collecting and processing Namarsh data for replicability and guidance in how protest event data can be collected and processed. It serves as supplementary material for the Namarsh dataset and paper.

Run

The code is stored in the Jupyther format and can be run using Google Colab or Kaggle via browser, locally-run Python or any kind of programming software (e.g., PyCharm).

Files

The namarsh-news-of-protest.ipynb file is a Jupyther notebook that contains the steps used when extracting data from News of Protest section.

The namarsh-translate.ipynb file is a Jupyther notebook that was used to translate the dataset (https://github.com/Animenosekai/translate is the library)

The towns.csv file is a list of Russian localities that was used to extract location data from the description part of the News of Protest section.

The supplementary folder contains reports written by namarsh for the website in .txt format for further analysis. It also contains .csv files from the dataset for the Forthcoming Events and News of Protest subsections.