CoVaxxy-Misinfo: A Python repository from OSoMe

Reproducibility code for "Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal" Francesco Pierri, Brea Perry, Matthew R. DeVerna, Kai-Cheng Yang, Alessandro Flammini, Filippo Menczer and John Bryden. Nature Scientific Reports (2022) https://www.nature.com/articles/s41598-022-10070-w

Structure

  .
  ├── README.md 
  ├── config.ini 
  └── data
  │   ├── county_level
  │   ├── covid19
  │   ├── misc
  │   ├── state_level
  │   └── twitter
  ├── intermediate_files
  ├── logs
  ├── output_files
  └── src
  └── v1-streaming

config.ini - configuration file that specifies paths and filenames for the scripts
data - folder which contains subfolders with raw data at the state and county level, as well as Twitter data. Check related README files for further details
intermediate_files - folder which contains intermediate data to be merged
logs - folder which contains logs for the output of scripts
src - folder which contains scripts to be executed
v1-streaming - folder which contains the code used to stream the tweets

Keywords and Low-credibility sources

You can find keywords used to filter Twitter stream in src/keywords.txt. You can find the list of low-credibility sources in intermediate_files/low_credibility.csv. Check the Github repository associated to our CoVaxxy project for further details.

Instructions to replicate results

Clone this repository in your local directory.
Put Twitter data in the data/twitter folder. You must put .json files with one tweet json per line. Check the Github repository associated to our CoVaxxy project to see how to download our dataset and reconstruct it using Twitter API.
Go to the src folder and execute Python (we used version 3.8.5) scripts (see associated src/README.md file for further details) in the following order:
- python3 twitter_data_processing.py ../config.ini - to process Twitter data
- python3 get_cases_and_deaths.py ../config.ini - download COVID-19 number of cases and deaths; modify config.ini to set the date range.
- python3 aggregate_cases_and_deaths.py ../config.ini - aggregate COVID-19 numbers of cases and deaths for further use
- python3 merge_datasets.py ../config.ini - merge together intermediate data in a single dataframe to be used for correlation.
Run STATA script (src/stata_script.do) to get correlation results using output_files/master_data--{%Y-%m-%d__%H-%M-%S}.csv.
To do Granger Causality analysis, go to the src folder and execute Python (we used version 3.8.5) scripts (see associated src/README.md file for further details) in the following order:
- python3 get_temporal_data.py ../config.ini - to generate daily aggregates at a user level
- python3 generate_aggregate_files.py ../config.ini - to then aggregate by county or state
- python3 causality.py ../config.ini - to run causality analysis

Dependencies

covidcast - install by running pip install covidcast. Details can be found here
carmen - install by running pip install carmen. Details can be found here
urlexpander - install by running pip install urlexpander. Details can be found here

osome-iu/CoVaxxy-Misinfo

Structure

Keywords and Low-credibility sources

Instructions to replicate results

Dependencies