Data and Codes for the Gender Based Violence Paper Download the required codes and Data from the path : https://drive.google.com/drive/folders/1wsIHm9ZFOt-DM9p8NiLaOQcjIhOWGHbR?usp=sharing
Read out following instructions for using the data and running the codes. Data Following is the description of all the data.
- Unzip Data_Github.zip
- The folder contains four folders for all the four categories of considered tweets i.e. physical(physical violence tweets), sexual(sexual violence tweets), harmful(harmful practices tweets) and generic (generic tweets).
- For physical, sexual and harmful folders there are yearwise subfolders 2016,2017,2018. Within these folders there are multiple files containing the required tweet ids (01, 02, 03...).
- The folder generic/ contains files 01, 02, 03 .... which has the required tweet ids for generic tweets.
- The code /Codes/crawl_tweet.py can be used to collect required meta-data from the tweet-ids.
Codes Unzip Codes.zip and all the required codes are present in this. Following are description of all the codes.
- preprocess.py : For preprocessing
- location_tagging.py : For tagging location
- divide_tweets_countrywise.py : For dividing tweets countrywise
- generate_countrywise_words.py : For generating words for further analysis
- get_correlation.py : For calculating correlation
- create_graph.py : For creating graph between countries
- get_correlation_catwise.py : For calculating correlation of different GBV categories
- create_desired_country_folder.py : For considering tweets from the considered countries
- calculate_document_frequency.py : For calculating document frequency of words.
- create_random_tweets.py : For creating random tweets for each considered country
- plot_scatter_random.py : For plotting scatter plots
- get_most_freq_random.py : For getting most frequent words
- filter_harmkeywords.py : For filtering harmful violence keywords
- filter_phykeywords.py : For filtering physical violence keywords
- filter_sexkeywords.py : For filtering sexual violence keywords
INPUT FILES Few codes requires input files which are present in the Codes/ folder such as HarmPrac.txt, world_gazeteer.csv, ...
OUTPUT FILES There are a few interim output files which are saved in the folder /Codes/outputs/