/CooRnet

Given a set of URLs, this packages detects coordinated link sharing behavior on social media and outputs the network of entities that performed such behaviour.

Primary LanguageRMIT LicenseMIT

Fabio Giglietto, Nicola Righetti, Luca Rossi

Final update

CooRnet requires CrowdTangle for data access. As of August 14, 2024, CrowdTangle is no longer available. Consequently, this repository is now archived. As alternatives for detecting coordinated sharing behavior on social media, we suggest CooRTweet for R users and coordination-network-toolkit for Pyhton users.

Overview

Given a set of URLs, this package detects coordinated link sharing behavior (CLSB) and outputs the network of entities that performed such behavior.

What do we mean by coordinated link sharing behavior?

CLSB refers to a specific coordinated activity performed by a network of Facebook pages, groups, and verified public profiles (Facebook public entities) that repeatedly share the same news articles in a very short time frame.

To identify such networks, we designed, implemented, and tested an algorithm that detects sets of Facebook public entities that perform CLSB by (1) estimating a time threshold that identifies URL shares performed by multiple distinguished entities within an unusually short period of time compared to the entire dataset, and (2) grouping the entities that repeatedly share the same news story within this coordination interval. The rationale is that while it may be common for several entities to share the same URLs, it is unlikely, unless a consistent coordination exists, that this occurs within the time threshold and repeatedly.

See also [references] for a more detailed description and real-world applications.

Installation

You can install CooRnet from GitHub.

# install.packages("devtools")

library("devtools")
devtools::install_github("fabiogiglietto/CooRnet")

API keys

CrowdTangle (required)

To use this package, you need to have a CrowdTangle API key. To add the key to your R environment file, follow these steps:

Open your .Renviron file, which is usually located in your home directory. If the file does not exist, create one and name it .Renviron.

Add a new line and enter your API key in the following format: CROWDTANGLE_API_KEY="YOUR_API_KEY"

Save the file and restart your R session to start using CooRnet.

OpenAI (optional)

You can also set up your credentials to access the OpenAI API, which returns a descriptive label for each cluster/component auto-generated by the GPT-3.5-Turbo OpenAI model.

The labels are obtained using the following system and user directives

"You are a researcher investigating coordinated and inauthentic behavior on Facebook and Instagram. Your objective is to generate concise, descriptive labels in English that capture the shared characteristics of clusters of Facebook or Instagram accounts.\n\n"

"I will supply a list of accounts for each cluster. For each account, you will receive a text that combines the account title and, if available, the account description. Identify the shared features among these accounts:"

[ADD ACCOUNTS TITLE/DESCRIPTION HERE]

"English label:"

To add the API key to your R environment file, follow these steps:

Open your .Renviron file.

Add two new lines and enter your API key in the following format: OPENAI_API_KEY = "YOUR_API_TOKEN"

Save the file and restart your R session to start using CooRnet.

NewsGuard (optional)

If you have an active subscription to the NewsGuard API, you can also set up your credentials to access it. This integration returns the average NewsGuard rating score obtained by the domains shared by a coordinated network. To add the API key to your R environment file, follow these steps:

Open your .Renviron file.

Add two new lines and enter your API key in the following format:

NG_KEY = "YOUR_API_KEY"

NG_SECRET = "YOUR_API_SECRET"

Save the file and restart your R session to start using CooRnet.

Usage

CooRnet requires a list of urls with respective publication date. Such list can be collected from various sources including CrowdTangle itself (e.g. Historical Data with a link post type filter), Twitter APIs (filtering for tweets with a url) and GDELT.

In this example, we use mediacloudr, an API Wrapper for the mediacloud.org API, to retrieve our initial list of news stories. Alternatively, is it also possible to use a CSV file exported from MediaCloud Explorer.

library("mediacloudr")

# add API key to your R environment file.
# Open your .Renviron file. The file is usually located in your home directory. If the file does not exist, just create one and name it .Renviron.
# Add a new line and enter your API key in the following format: MEDIACLOUD_API_KEY=<YOUR_API_KEY>.
# Save the file and restart your current R session to start using mediacloudr.

df <- get_story_list(rows = 100,
                           fq = "(text:coronavirus OR text:'covid-19' OR text:'SARS-CoV-2') AND (tags_id_media:186572515 OR tags_id_media:186572435 OR tags_id_media:186572516 OR tags_id_media:162546808 OR tags_id_media:162546809) AND publish_date:[2020-03-02T00:00:00.000Z TO 2020-04-03T00:00:00.000Z]")

# Alternative using a file exported from MediaCloud Explorer and uploaded into r
# df <- readr::read_csv("rawdata/MediaCloud_output.csv") # file exported from MediaCloud

library("CooRnet")

# get public shares of MediaCloud URLs on Facebook and Instagram (assumes a valid CROWDTANGLE_API_KEY in Env).
ct_shares.df <- get_ctshares(df, url_column = "url", date_column = "publish_date", platforms = "facebook,instagram", sleep_time = 1, nmax = 100, clean_urls = TRUE)

# get coordinated shares and networks
output <- get_coord_shares(ct_shares.df = ct_shares.df)

# creates the output objects in the environment
# 1. ct_shares_marked.df: a dataframe that contains all the retrieved social media shares plus an extra boolean variable (is_coordinated) that identify if the shares was coordinated.
# 2. highly_connected_g: igraph graph of coordinated networks of pages/groups/accounts
# 3. highly_connected_coordinated_entities: a dataframe that lists coordinated entities and corresponding component
get_outputs(output)

# save highly_connected_coordinated_entities in .CSV format for further analysis
write.csv(highly_connected_coordinated_entities, file = "highly_connected_coordinated_entities.csv")

# save highly_connected_g in .GRAPHML format for further analysis with Gephi or similar softwares
igraph::write.graph(highly_connected_g, file = "highly_connected_g.graphml", format = "graphml")

Advanced Usage

CooRnet also features additional functions that streamlines the analysis of coordinated networks.

Component Summary

get_component_summary creates an handy aggregated views of network properties including:

  1. the proportion of coordinated shares over the total shares (coorshare_ratio) 2) the average coordinated score (avg_cooRscore) 3) a measure of dispersion (gini) in the distribution of domains coordinatedly shared by the component (0-1) 4) the top 5 coordinatedly shared domains (ranked by n. of shares) 5) the total number coordinatedly shared of domains

Usage is simple:

comp_summary <- get_component_summary(output, labels = TRUE)

# save comp_summary in .CSV format for further analysis
write.csv(comp_summary, file = "comp_summary.csv")

Top URLs

get_top_coord_urls creates a ranked list of best performing URLs shared by coordinated networks. URLs can be sorted by various criteria including "engagement" wich is a sum of comments, shares and reactions (likes and other reactions) gathered by that URL.

You can customize the behaviour of get_top_coord_urls by specifing that you want a rank of URLs per componenent (top URLs for each network) or not and specify the number of URLs you want the function to return.

top_urls_per_components <- get_top_coord_urls(output, order_by = "engagement", component = TRUE, top = 10)

top_urls_all <- get_top_coord_urls(output, order_by = "engagement", component = FALSE, top = 10)

# save top_urls_per_components in .CSV format for further analysis
write.csv(top_urls_per_components, file = "top_urls_per_components.csv")

# save top_urls_all in .CSV format for further analysis
write.csv(top_urls_all, file = "top_urls_all.csv")

Top Shares

get_top_coord_shares creates a ranked list of best performing posts with link (shares) published by coordinated networks. Shares can be sorted by various criteria including "engagement" wich is of comments, shares and reactions (likes and other reactions) gathered by that post.

You can customize the behaviour of get_top_coord_shares by specifing that you want a rank of URLs per componenent (top shares for each network) or not and specify the number of URLs you want the function to return.

top_shares_per_components <- get_top_coord_shares(output, order_by = "engagement", component = TRUE, top = 10)

top_shares_all <- get_top_coord_shares(output, order_by = "engagement", component = FALSE, top = 10)

# save top_shares_per_components in .CSV format for further analysis
write.csv(top_shares_per_components, file = "top_shares_per_components.csv")

# save top_shares_all in .CSV format for further analysis
write.csv(top_shares_all, file = "top_shares_all.csv")

Acknowledgements

CooRnet has been developed as part of Patterns of Facebook Interactions around Insular and Cross-Partisan Media Sources in the Run-up of the 2018 Italian Election research project activities.

The project is supported, in part, by a grant by a grant from Social Science Research Council. Data and tools provided by Facebook.

References

  • Giglietto, F., Righetti, N., Rossi, L., & Marino, G. (2020). Coordinated Link Sharing Behavior as a Signal to Surface Sources of Problematic Information on Facebook. International Conference on Social Media and Society, 85--91. https://doi.org/10.1145/3400806.3400817

  • Giglietto, F., Righetti, N., Rossi, L., & Marino, G. (2020). It takes a village to manipulate the media: coordinated link sharing behavior during 2018 and 2019 Italian elections. Information, Communication and Society, 1--25. https://doi.org/10.1080/1369118X.2020.1739732

  • Giglietto, F., Righetti, N., & Marino, G. (2019). Understanding Coordinated and Inauthentic Link Sharing Behavior on Facebook in the Run-up to 2018 General Election and 2019 European Election in Italy. https://doi.org/10.31235/osf.io/3jteh