/rsocialwatcher

A Social Data Collector for Facebook Marketing API

Primary LanguageROtherNOASSERTION

rsocialwatcher

CRAN_Status_Badge downloads GitHub Repo stars activity License: MIT R-CMD-check

Query data from the Facebook Marketing API using R, with a focus for social science research.

Overview

This package facilitates querying data from the Facebook Marketing API. The packages is inspired by pySocialWatcher, which is a similar package built for Python. Emerging research has shown that the Facebook Marketing API can provide useful data for social science research. For example, Facebook marketing data has been used for:

The package provides the following functions:

  • get_fb_parameter_ids(): To obtain IDs for targeting users by different characteristics, including (1) different parameter types (eg, behaviors and interests) and (2) location keys (eg, city keys)
  • get_location_coords(): To obtain coordinates and, when available, geometries of locations based on their location keys.
  • query_fb_marketing_api(): Query daily and monthly active users, querying users for specific locations and by specific types.
  • get_fb_suggested_radius(): Determine a suggested radius to reach enough people for a given coordinate pair.

Installation

The package can be installed via CRAN.

install.packages("rsocialwatcher")

You can install the development version of rsocialwatcher from GitHub with:

# install.packages("devtools")
devtools::install_github("worldbank/rsocialwatcher")

Facebook API Credentials

Using the Facebook Marketing API requires indicating the following:

  1. Token
  2. Version
  3. Creation

Follow the instructions here to obtain these credentials.

Quickstart

Setup

library(rsocialwatcher)
library(dplyr)

# Define API version, creation act & token -------------------------------------
VERSION      <- "[ENTER HERE]" # Example: "v19.0"
CREATION_ACT <- "[ENTER HERE]"
TOKEN        <- "[ENTER HERE]"

Get dataframes of select parameter IDs

# Get dataframe of Facebook parameter IDs and descriptions ---------------------
## Interests and behaviors
interests_df <- get_fb_parameter_ids("interests", VERSION, TOKEN)
behaviors_df <- get_fb_parameter_ids("behaviors", VERSION, TOKEN)

head(behaviors_df[,1:3])
#>              id                              name      type
#> 1 6002714895372               Frequent travellers behaviors
#> 2 6002714898572             Small business owners behaviors
#> 3 6002764392172 Facebook Payments users (90 days) behaviors
#> 4 6003808923172         Early technology adopters behaviors
#> 5 6003986707172   Facebook access (OS): Windows 7 behaviors
#> 6 6003966451572    Facebook access (OS): Mac OS X behaviors
## Locations: countries
country_df <- get_fb_parameter_ids("country", VERSION, TOKEN)

head(country_df)
#>   key                 name    type country_code supports_region supports_city
#> 1  AD              Andorra country           AD            TRUE         FALSE
#> 2  AE United Arab Emirates country           AE            TRUE          TRUE
#> 3  AF          Afghanistan country           AF            TRUE         FALSE
#> 4  AG              Antigua country           AG            TRUE         FALSE
#> 5  AI             Anguilla country           AI            TRUE         FALSE
#> 6  AL              Albania country           AL            TRUE         FALSE

Query data for different location types

Example: Query Facebook users in US

us_key <- country_df |> 
  filter(name == "United States") |> 
  pull(key)

query_fb_marketing_api(
  location_unit_type = "countries",
  location_keys      = us_key,
  version            = VERSION, 
  creation_act       = CREATION_ACT, 
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1    219899444                234900000                276400000
#>   location_unit_type location_types location_keys gender age_min age_max
#> 1          countries home or recent            US 1 or 2      18      65
#>     api_call_time_utc
#> 1 2024-05-04 17:03:38

Example: Query Facebook users around specific location

query_fb_marketing_api(
  location_unit_type = "coordinates",
  lat_lon            = c(40.712, -74.006),
  radius             = 5,
  radius_unit        = "kilometer",
  version            = VERSION, 
  creation_act       = CREATION_ACT, 
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1      1871425                  2400000                  2800000
#>   location_unit_type location_types radius radius_unit gender age_min age_max
#> 1        coordinates home or recent      5   kilometer 1 or 2      18      65
#>   latitude longitude   api_call_time_utc
#> 1   40.712   -74.006 2024-05-04 17:03:38

Obtain location coordinates/geometries

Example: Location coordinates and, when available, geometries can be obtained using the get_location_coords function.

get_location_coords(
  location_unit_type = "countries",
  location_keys      = c("US", "MX", "CA"),
  version            = VERSION,
  token              = TOKEN
)
#> Simple feature collection with 3 features and 7 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -179.2302 ymin: 14.53211 xmax: 179.8597 ymax: 83.11495
#> Geodetic CRS:  WGS 84
#>   key    type          name supports_city supports_region latitude longitude
#> 1  US country United States          TRUE            TRUE 40.00000 -100.0000
#> 2  MX country        Mexico          TRUE            TRUE 23.31667 -102.3667
#> 3  CA country        Canada          TRUE            TRUE 56.00000 -109.0000
#>                         geometry
#> 1 MULTIPOLYGON (((177.2906 52...
#> 2 MULTIPOLYGON (((-118.3256 2...
#> 3 MULTIPOLYGON (((-132.5786 5...

Example: In addition, when obtaining location IDs using the query_fb_marketing_api function, we can directly add coordinates/geometries by setting the add_location_coords to TRUE.

get_fb_parameter_ids(
  type = "region", 
  country_code = "US", 
  version = VERSION, 
  token = TOKEN,
  add_location_coords = T) |>
  head()
#>    key          name   type country_code  country_name supports_region
#> 1 3866     Minnesota region           US United States            TRUE
#> 2 3855         Idaho region           US United States            TRUE
#> 3 3856      Illinois region           US United States            TRUE
#> 4 3864 Massachusetts region           US United States            TRUE
#> 5 3846      Arkansas region           US United States            TRUE
#> 6 3886         Texas region           US United States            TRUE
#>   supports_city latitude longitude                       geometry
#> 1          TRUE     46.0     -94.0 MULTIPOLYGON (((-97.1811 48...
#> 2          TRUE     45.0    -114.0 MULTIPOLYGON (((-117.0265 4...
#> 3          TRUE     40.0     -89.0             MULTIPOLYGON EMPTY
#> 4          TRUE     42.3     -71.8             MULTIPOLYGON EMPTY
#> 5          TRUE     34.8     -92.2 MULTIPOLYGON (((-94.26958 3...
#> 6          TRUE     31.0    -100.0             MULTIPOLYGON EMPTY

Get suggested radius

Facebook enables querying a specific location to determine a suggested radius to reach enough people (see Facebook documentation here). We can use the get_fb_suggested_radius function to get the suggested radius. Below shows the querying the suggested radius for Paris, France and Paris, Kentucky.

# Paris, France
get_fb_suggested_radius(location = c(48.856667, 2.352222),
                        version = VERSION,
                        token = TOKEN)
#>   suggested_radius distance_unit
#> 1                1     kilometer

# Paris, Kentucky
get_fb_suggested_radius(location = c(38.209682, -84.253915),
                        version = VERSION,
                        token = TOKEN)
#>   suggested_radius distance_unit
#> 1               25     kilometer

Query data for different user attributes

Example [One parameter]: Facebook users who primarily access Facebook using Mac OS X living in the US

beh_mac_id <- behaviors_df |> 
  filter(name == "Facebook access (OS): Mac OS X") |> 
  pull(id)

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = "US",
  behaviors          = beh_mac_id,
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1       114053                   138100                   162500
#>   location_unit_type location_types location_keys     behaviors gender age_min
#> 1          countries home or recent            US 6003966451572 1 or 2      18
#>   age_max   api_call_time_utc
#> 1      65 2024-05-04 17:03:51

Example [One parameter]: Facebook users who are likely technology early adopters

beh_tech_id <- behaviors_df |> 
  filter(name == "Early technology adopters") |> 
  pull(id)

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = "US",
  behaviors          = beh_tech_id,
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1     13957411                 14100000                 16600000
#>   location_unit_type location_types location_keys     behaviors gender age_min
#> 1          countries home or recent            US 6003808923172 1 or 2      18
#>   age_max   api_call_time_utc
#> 1      65 2024-05-04 17:03:52

Example [Two parameters, OR condition]: Facebook users who primarily access Facebook using Mac OS X OR who are likely technology early adopters who live in the US. Vectors of IDs are used to specify OR conditions.

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = "US",
  behaviors          = c(beh_mac_id, beh_tech_id),
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1     14107933                 14300000                 16800000
#>   location_unit_type location_types location_keys
#> 1          countries home or recent            US
#>                        behaviors gender age_min age_max   api_call_time_utc
#> 1 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:03:53

Example [Two parameters, AND condition]: Facebook users who primarily access Facebook using Mac OS X AND who are likely technology early adopters who live in the US. Lists of IDs are used to specify AND conditions.

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = "US",
  behaviors          = list(beh_mac_id, beh_tech_id),
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1        10883                    10500                    12400
#>   location_unit_type location_types location_keys
#> 1          countries home or recent            US
#>                         behaviors gender age_min age_max   api_call_time_utc
#> 1 6003966451572 and 6003808923172 1 or 2      18      65 2024-05-04 17:03:55

Example [Two parameters types]: Across parameter types, AND conditions are used. The below example queries Facebook users who (1) primarily access Facebook using Mac OS X AND (2) who are likely technology early adopters AND (3) are interested in computers, who live in the US. The “flex_target” parameters can be used to specify OR conditions across parameters; see here for examples.

int_comp_id <- interests_df |> 
  filter(name == "Computers (computers & electronics)") |> 
  pull(id)

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = "US",
  behaviors          = list(beh_mac_id, beh_tech_id),
  interests          = int_comp_id,
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1         6538                     5900                     6900
#>   location_unit_type location_types location_keys     interests
#> 1          countries home or recent            US 6003404634364
#>                         behaviors gender age_min age_max   api_call_time_utc
#> 1 6003966451572 and 6003808923172 1 or 2      18      65 2024-05-04 17:03:57

Map Over Multiple Queries

Putting parameters in the map_param function results in the query_fb_marketing_api function making multiple queries.

Example: Make queries for different countries.

country_df |> 
  filter(name %in% c("United States", "Canada", "Mexico")) |> 
  pull(key)
#> [1] "CA" "MX" "US"

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = map_param("US", "CA", "MX"),
  behaviors          = c(beh_mac_id, beh_tech_id),
  interests          = int_comp_id,
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1      8388414                  7700000                  9100000
#> 2       886291                   834100                   981300
#> 3      2438417                  2200000                  2600000
#>   location_unit_type location_types location_keys     interests
#> 1          countries home or recent            US 6003404634364
#> 2          countries home or recent            CA 6003404634364
#> 3          countries home or recent            MX 6003404634364
#>                        behaviors gender age_min age_max   api_call_time_utc
#> 1 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:03:58
#> 2 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:03:59
#> 3 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:03:59

Example: Make queries for different and behaviors. In total, six queries are made (mapping over three countries and two parameters).

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = map_param("US", "CA", "MX"),
  behaviors          = map_param(beh_mac_id, beh_tech_id),
  interests          = int_comp_id,
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1        61908                    58200                    68500
#> 2        15795                    15000                    17700
#> 3        22012                    20900                    24500
#> 4      8311533                  7700000                  9000000
#> 5       869378                   817800                   962100
#> 6      2438287                  2200000                  2600000
#>   location_unit_type location_types location_keys     interests     behaviors
#> 1          countries home or recent            US 6003404634364 6003966451572
#> 2          countries home or recent            CA 6003404634364 6003966451572
#> 3          countries home or recent            MX 6003404634364 6003966451572
#> 4          countries home or recent            US 6003404634364 6003808923172
#> 5          countries home or recent            CA 6003404634364 6003808923172
#> 6          countries home or recent            MX 6003404634364 6003808923172
#>   gender age_min age_max   api_call_time_utc
#> 1 1 or 2      18      65 2024-05-04 17:04:00
#> 2 1 or 2      18      65 2024-05-04 17:04:00
#> 3 1 or 2      18      65 2024-05-04 17:04:01
#> 4 1 or 2      18      65 2024-05-04 17:04:03
#> 5 1 or 2      18      65 2024-05-04 17:04:04
#> 6 1 or 2      18      65 2024-05-04 17:04:04

Example: Make query for each country, for:

  • Those that access Facebook using Mac OS X OR who are likely technology early adopters
  • Those that access Facebook using Mac OS X AND who are likely technology early adopters

The below illustrates how we can make complex queries (ie, using AND and OR) conditions within map_param()

query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = map_param("US", "CA", "MX"),
  behaviors          = map_param(c(beh_mac_id, beh_tech_id), # OR condition
                                 list(beh_mac_id, beh_tech_id)), # AND condition
  interests          = int_comp_id,
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1      8388414                  7700000                  9100000
#> 2       886291                   834100                   981300
#> 3      2438417                  2200000                  2600000
#> 4         6538                     5900                     6900
#> 5         1391                     1300                     1500
#> 6         1699                     1500                     1700
#>   location_unit_type location_types location_keys     interests
#> 1          countries home or recent            US 6003404634364
#> 2          countries home or recent            CA 6003404634364
#> 3          countries home or recent            MX 6003404634364
#> 4          countries home or recent            US 6003404634364
#> 5          countries home or recent            CA 6003404634364
#> 6          countries home or recent            MX 6003404634364
#>                         behaviors gender age_min age_max   api_call_time_utc
#> 1  6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:05
#> 2  6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:05
#> 3  6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:05
#> 4 6003966451572 and 6003808923172 1 or 2      18      65 2024-05-04 17:04:06
#> 5 6003966451572 and 6003808923172 1 or 2      18      65 2024-05-04 17:04:07
#> 6 6003966451572 and 6003808923172 1 or 2      18      65 2024-05-04 17:04:07

Example: Make queries using vector as input. Below, we want to make a separate query for six countries. We define the following vector:

countries <- c("US", "CA", "MX", "FR", "GB", "ES")

However, for the below:

location_keys = map_param(countries)

map_param() views countries as one item (a vector of countries), so will make just 1 query—querying the number of MAU/DAU across countries. To make a query for each item in the vector, we use map_param_vec().

Incorrect attempt to making query for each country

countries <- c("US", "CA", "MX", "FR", "GB", "ES")

# INCORRECT: The below will make 1 query, querying the number of MAU/DAU across the six countries. The function inteprets the input as the number of Facebook users in the US or Canada or Mexico, etc.
query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = map_param(countries),
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1    450330182                484900000                570500000
#>   location_unit_type location_types                    location_keys gender
#> 1          countries home or recent US or CA or MX or FR or GB or ES 1 or 2
#>   age_min age_max   api_call_time_utc
#> 1      18      65 2024-05-04 17:04:07

Incorrect approach to make query for each country

countries <- c("US", "CA", "MX", "FR", "GB", "ES")

# CORRECT: The below will make 6 queries, one for each country.
query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = map_param_vec(countries),
  version            = VERSION,
  creation_act       = CREATION_ACT,
  token              = TOKEN)
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1    219899444                234900000                276400000
#> 2     26595379                 27700000                 32600000
#> 3     88476855                 95600000                112400000
#> 4     37889724                 39700000                 46800000
#> 5     46496564                 46800000                 55100000
#> 6     28909238                 30700000                 36100000
#>   location_unit_type location_types location_keys gender age_min age_max
#> 1          countries home or recent            US 1 or 2      18      65
#> 2          countries home or recent            CA 1 or 2      18      65
#> 3          countries home or recent            MX 1 or 2      18      65
#> 4          countries home or recent            FR 1 or 2      18      65
#> 5          countries home or recent            GB 1 or 2      18      65
#> 6          countries home or recent            ES 1 or 2      18      65
#>     api_call_time_utc
#> 1 2024-05-04 17:04:08
#> 2 2024-05-04 17:04:08
#> 3 2024-05-04 17:04:11
#> 4 2024-05-04 17:04:11
#> 5 2024-05-04 17:04:12
#> 6 2024-05-04 17:04:13

Using Multiple API Tokens

The Facebook API is rate limited, where only a certain number of queries can be made in a given time. If the rate limit is reached, query_fb_marketing_api will pause then try the query until it is successfully called. query_fb_marketing_api can take a long time to complete if mapping over a large number of queries.

Multiple API tokens can be used to minimize delay times from the function reaching its rate limit. To use multiple tokens, enter a vector with multiple entries for version, creation_act, and token.

Example: Using multiple API tokens

# We only have 1 token, but we'll pretend we have three
TOKEN_1 <- TOKEN
TOKEN_2 <- TOKEN
TOKEN_3 <- TOKEN

VERSION_1 <- VERSION
VERSION_2 <- VERSION
VERSION_3 <- VERSION

CREATION_ACT_1 <- CREATION_ACT
CREATION_ACT_2 <- CREATION_ACT
CREATION_ACT_3 <- CREATION_ACT

# Make query
query_fb_marketing_api(
  location_unit_type = "country",
  location_keys      = map_param("US", "CA", "MX", "GB", "FR", "DE", "IT"),
  behaviors          = c(beh_mac_id, beh_tech_id),
  interests          = int_comp_id,
  version            = c(VERSION_1,      VERSION_2,      VERSION_3) ,
  creation_act       = c(CREATION_ACT_1, CREATION_ACT_2, CREATION_ACT_3),
  token              = c(TOKEN_1,        TOKEN_2,        TOKEN_3) )
#>   estimate_dau estimate_mau_lower_bound estimate_mau_upper_bound
#> 1      8388414                  7700000                  9100000
#> 2       886291                   834100                   981300
#> 3      2438417                  2200000                  2600000
#> 4       942296                   890600                  1000000
#> 5       644377                   597500                   703000
#> 6       672929                   626500                   737000
#> 7       667252                   599300                   705000
#>   location_unit_type location_types location_keys     interests
#> 1          countries home or recent            US 6003404634364
#> 2          countries home or recent            CA 6003404634364
#> 3          countries home or recent            MX 6003404634364
#> 4          countries home or recent            GB 6003404634364
#> 5          countries home or recent            FR 6003404634364
#> 6          countries home or recent            DE 6003404634364
#> 7          countries home or recent            IT 6003404634364
#>                        behaviors gender age_min age_max   api_call_time_utc
#> 1 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:13
#> 2 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:14
#> 3 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:14
#> 4 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:15
#> 5 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:15
#> 6 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:16
#> 7 6003966451572 or 6003808923172 1 or 2      18      65 2024-05-04 17:04:16

Summary of Input Methods

The below table summarizes different ways parameters can be entered into the query_fb_marketing_api for different purposes. The table uses output from the following code.

behaviors_df <- get_fb_parameter_ids("behaviors", VERSION, TOKEN)

beh_mac_id <- behaviors_df |> 
  filter(name == "Facebook access (OS): Mac OS X") |> 
  pull(id)
  
beh_tech_id <- behaviors_df |> 
  filter(name == "Early technology adopters") |> 
  pull(id)
  
beh_ids <- c(beh_mac_id, beh_tech_id)
Method Function Example input in query_fb_marketing_api(behaviors = [], ...) Description
Or condition c() c(beh_mac_id, beh_tech_id) Facebook users with beh_mac_id OR beh_tech_id behaviors
And condition list() list(beh_mac_id, beh_tech_id) Facebook users with beh_mac_id AND beh_tech_id behaviors
Two queries [Way 1] map_param() map_param(beh_mac_id, beh_tech_id) One query for Facebook users with beh_mac_id; second query for beh_tech_id
Two queries [Way 2] map_param_vec() map_param_vec(beh_ids) One query for Facebook users with beh_mac_id; second query for beh_tech_id

Usage

See this vignette for additional information and examples illustrating how to use the package.