This is a repository for scripts written to collect data on wikipedia articles and lists with the intent of understanding how language is fought over on Wikipedia. Research is based in collecting lists of articles compiled by Wikipedia editors or categories on America novels, TV shows, and films and compiling revisions to individual articles included in those lists over time.
The project is being conducted for the McGill txtLab.
This repository contains the code used to collect and analyze a set of articles as part of a project for the txtLab. Using wikipedia-histories, a module to scrape article revisions, social network representations of editor relationships were created to represent the ways editors cross over between domains.
The complete dataset is available on the Harvard Dataverse.