/rrapply

rrapply: revisiting base-R's rapply

Primary LanguageR

{rrapply}: Revisiting R-base rapply()

CRAN version R-CMD-check codecov status Total Downloads

The minimal {rrapply}-package contains a single function rrapply(), providing an extended implementation of R-base’s rapply() function, which applies a function f to all elements of a nested list recursively and controls how to structure the returned result. rrapply() builds upon rapply()’s native C implementation and for this reason requires no external R-package dependencies.

Installation

# Install latest release from CRAN:
install.packages("rrapply")

# Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("JorisChau/rrapply")

Cheat sheet

When to use rrapply()

List pruning and unnesting

how = "prune"

With base rapply(), there is no convenient way to prune or filter list elements from the input list object. The rrapply() function adds an option how = "prune" to prune all list elements not subject to application of f from a nested list,

library(rrapply)

## data: renewable energy per country in 2016 (% of total energy consumption)
data("renewable_energy_by_country")

## subset countries and areas in Oceania
renewable_oceania <- renewable_energy_by_country[["World"]]["Oceania"]
str(renewable_oceania, list.len = 3, give.attr = FALSE)

#> List of 1
#>  $ Oceania:List of 4
#>   ..$ Australia and New Zealand:List of 6
#>   .. ..$ Australia                        : num 9.32
#>   .. ..$ Christmas Island                 : logi NA
#>   .. ..$ Cocos (Keeling) Islands          : logi NA
#>   .. .. [list output truncated]
#>   ..$ Melanesia                :List of 5
#>   .. ..$ Fiji            : num 24.4
#>   .. ..$ New Caledonia   : num 4.03
#>   .. ..$ Papua New Guinea: num 50.3
#>   .. .. [list output truncated]
#>   ..$ Micronesia               :List of 8
#>   .. ..$ Guam                                : num 3.03
#>   .. ..$ Kiribati                            : num 45.4
#>   .. ..$ Marshall Islands                    : num 11.8
#>   .. .. [list output truncated]
#>   .. [list output truncated]
## drop all logical NA's while preserving list structure 
rrapply(
  renewable_oceania,
  f = \(x) x,
  classes = "numeric",
  how = "prune"
) |>
  str(list.len = 3, give.attr = FALSE)

#> List of 1
#>  $ Oceania:List of 4
#>   ..$ Australia and New Zealand:List of 2
#>   .. ..$ Australia  : num 9.32
#>   .. ..$ New Zealand: num 32.8
#>   ..$ Melanesia                :List of 5
#>   .. ..$ Fiji            : num 24.4
#>   .. ..$ New Caledonia   : num 4.03
#>   .. ..$ Papua New Guinea: num 50.3
#>   .. .. [list output truncated]
#>   ..$ Micronesia               :List of 7
#>   .. ..$ Guam                            : num 3.03
#>   .. ..$ Kiribati                        : num 45.4
#>   .. ..$ Marshall Islands                : num 11.8
#>   .. .. [list output truncated]
#>   .. [list output truncated]

how = "flatten"

Instead, use how = "flatten" to return a flattened unnested version of the pruned list,

## drop all logical NA's and return unnested list
rrapply(
  renewable_oceania,
  f = \(x) x,
  classes = "numeric",
  how = "flatten"
) |>
  head(n = 10)

#>        Australia      New Zealand             Fiji    New Caledonia 
#>             9.32            32.76            24.36             4.03 
#> Papua New Guinea  Solomon Islands          Vanuatu             Guam 
#>            50.34            65.73            33.67             3.03 
#>         Kiribati Marshall Islands 
#>            45.43            11.75

Hint: the options argument allows to avoid coercion of the flattened list to a vector and/or to include all parent list names in the result.

## flatten to simple list with full names
rrapply(
  renewable_oceania,
  f = \(x) x,
  classes = "numeric",
  how = "flatten",
  options = list(namesep = ".", simplify = FALSE)
) |>
  str(list.len = 5, give.attr = FALSE)

#> List of 22
#>  $ Oceania.Australia and New Zealand.Australia        : num 9.32
#>  $ Oceania.Australia and New Zealand.New Zealand      : num 32.8
#>  $ Oceania.Melanesia.Fiji                             : num 24.4
#>  $ Oceania.Melanesia.New Caledonia                    : num 4.03
#>  $ Oceania.Melanesia.Papua New Guinea                 : num 50.3
#>   [list output truncated]

how = "melt"

Or, use how = "melt" to return a melted data.frame of the pruned list similar in format to reshape2::melt() applied to a nested list.

## drop all logical NA's and return melted data.frame
oceania_melt <- rrapply(
  renewable_oceania,
  f = \(x) x,
  classes = "numeric",
  how = "melt"
)
head(oceania_melt)

#>        L1                        L2               L3 value
#> 1 Oceania Australia and New Zealand        Australia  9.32
#> 2 Oceania Australia and New Zealand      New Zealand 32.76
#> 3 Oceania                 Melanesia             Fiji 24.36
#> 4 Oceania                 Melanesia    New Caledonia  4.03
#> 5 Oceania                 Melanesia Papua New Guinea 50.34
#> 6 Oceania                 Melanesia  Solomon Islands 65.73

A melted data.frame can be used to reconstruct a nested list with how = "unmelt",

## reconstruct nested list from melted data.frame
rrapply(oceania_melt, how = "unmelt") |>
  str(list.len = 3, give.attr = FALSE)

#> List of 1
#>  $ Oceania:List of 4
#>   ..$ Australia and New Zealand:List of 2
#>   .. ..$ Australia  : num 9.32
#>   .. ..$ New Zealand: num 32.8
#>   ..$ Melanesia                :List of 5
#>   .. ..$ Fiji            : num 24.4
#>   .. ..$ New Caledonia   : num 4.03
#>   .. ..$ Papua New Guinea: num 50.3
#>   .. .. [list output truncated]
#>   ..$ Micronesia               :List of 7
#>   .. ..$ Guam                            : num 3.03
#>   .. ..$ Kiribati                        : num 45.4
#>   .. ..$ Marshall Islands                : num 11.8
#>   .. .. [list output truncated]
#>   .. [list output truncated]

how = "bind"

Nested lists containing repeated observations can be unnested with how = "bind", which returns a wide data.frame similar in format to dplyr::bind_rows() applied to a list of data.frames or repeated application of tidyr::unnest_wider() to a nested data.frame.

## data: nested list of Pokemon properties in Pokemon GO
data("pokedex")
str(pokedex, list.len = 3)

#> List of 1
#>  $ pokemon:List of 151
#>   ..$ :List of 16
#>   .. ..$ id            : int 1
#>   .. ..$ num           : chr "001"
#>   .. ..$ name          : chr "Bulbasaur"
#>   .. .. [list output truncated]
#>   ..$ :List of 17
#>   .. ..$ id            : int 2
#>   .. ..$ num           : chr "002"
#>   .. ..$ name          : chr "Ivysaur"
#>   .. .. [list output truncated]
#>   ..$ :List of 15
#>   .. ..$ id            : int 3
#>   .. ..$ num           : chr "003"
#>   .. ..$ name          : chr "Venusaur"
#>   .. .. [list output truncated]
#>   .. [list output truncated]
## unnest list to wide data.frame
rrapply(pokedex, how = "bind")[, c(1:3, 5:8)] |>
  head(n = 9)

#>   id num       name          type height   weight            candy
#> 1  1 001  Bulbasaur Grass, Poison 0.71 m   6.9 kg  Bulbasaur Candy
#> 2  2 002    Ivysaur Grass, Poison 0.99 m  13.0 kg  Bulbasaur Candy
#> 3  3 003   Venusaur Grass, Poison 2.01 m 100.0 kg  Bulbasaur Candy
#> 4  4 004 Charmander          Fire 0.61 m   8.5 kg Charmander Candy
#> 5  5 005 Charmeleon          Fire 1.09 m  19.0 kg Charmander Candy
#> 6  6 006  Charizard  Fire, Flying 1.70 m  90.5 kg Charmander Candy
#> 7  7 007   Squirtle         Water 0.51 m   9.0 kg   Squirtle Candy
#> 8  8 008  Wartortle         Water 0.99 m  22.5 kg   Squirtle Candy
#> 9  9 009  Blastoise         Water 1.60 m  85.5 kg   Squirtle Candy

Hint: set options = list(namecols = TRUE) to include the parent list names associated to each row in the wide data.frame as individual columns L1, L2, etc.

## bind to data.frame including parent columns
rrapply(pokedex, how = "bind", options = list(namecols = TRUE))[, c(1:5, 7:10)] |>
  head(n = 6)

#>        L1 L2 id num       name          type height   weight            candy
#> 1 pokemon  1  1 001  Bulbasaur Grass, Poison 0.71 m   6.9 kg  Bulbasaur Candy
#> 2 pokemon  2  2 002    Ivysaur Grass, Poison 0.99 m  13.0 kg  Bulbasaur Candy
#> 3 pokemon  3  3 003   Venusaur Grass, Poison 2.01 m 100.0 kg  Bulbasaur Candy
#> 4 pokemon  4  4 004 Charmander          Fire 0.61 m   8.5 kg Charmander Candy
#> 5 pokemon  5  5 005 Charmeleon          Fire 1.09 m  19.0 kg Charmander Candy
#> 6 pokemon  6  6 006  Charizard  Fire, Flying 1.70 m  90.5 kg Charmander Candy

Condition function

Base rapply() allows to apply a function f to list elements of certain types or classes via the classes argument. rrapply() generalizes this option via an additional condition argument, which accepts any function to use as a condition or predicate to apply f to a subset of list elements.

## drop all NA elements using condition 
rrapply(
  renewable_oceania,
  condition = \(x) !is.na(x),
  f = \(x) x,
  how = "prune"
) |>
  str(list.len = 3, give.attr = FALSE)

#> List of 1
#>  $ Oceania:List of 4
#>   ..$ Australia and New Zealand:List of 2
#>   .. ..$ Australia  : num 9.32
#>   .. ..$ New Zealand: num 32.8
#>   ..$ Melanesia                :List of 5
#>   .. ..$ Fiji            : num 24.4
#>   .. ..$ New Caledonia   : num 4.03
#>   .. ..$ Papua New Guinea: num 50.3
#>   .. .. [list output truncated]
#>   ..$ Micronesia               :List of 7
#>   .. ..$ Guam                            : num 3.03
#>   .. ..$ Kiribati                        : num 45.4
#>   .. ..$ Marshall Islands                : num 11.8
#>   .. .. [list output truncated]
#>   .. [list output truncated]
## filter all countries with values > 85%
rrapply(
  renewable_energy_by_country,
  condition = \(x) x > 85,
  how = "prune"
) |>
  str(give.attr = FALSE)

#> List of 1
#>  $ World:List of 1
#>   ..$ Africa:List of 1
#>   .. ..$ Sub-Saharan Africa:List of 3
#>   .. .. ..$ Eastern Africa:List of 7
#>   .. .. .. ..$ Burundi                    : num 89.2
#>   .. .. .. ..$ Ethiopia                   : num 91.9
#>   .. .. .. ..$ Rwanda                     : num 86
#>   .. .. .. ..$ Somalia                    : num 94.7
#>   .. .. .. ..$ Uganda                     : num 88.6
#>   .. .. .. ..$ United Republic of Tanzania: num 86.1
#>   .. .. .. ..$ Zambia                     : num 88.5
#>   .. .. ..$ Middle Africa :List of 2
#>   .. .. .. ..$ Chad                            : num 85.3
#>   .. .. .. ..$ Democratic Republic of the Congo: num 97
#>   .. .. ..$ Western Africa:List of 1
#>   .. .. .. ..$ Guinea-Bissau: num 86.5

Special arguments .xname, .xpos, .xparents and .xsiblings

With base rapply(), the f function only has access to the content of the list element under evaluation, and there is no convenient way to access its name or location in the nested list from inside f. rrapply() defines the special arguments .xname, .xpos, .xparents, .xsiblings inside the f and condition functions (in addition to the principal function argument):

  • .xname evaluates to the name of the list element;
  • .xpos evaluates to the position of the element in the nested list structured as an integer vector;
  • .xparents evaluates to a vector of parent list names in the path to the current list element;
  • .xsiblings evaluates to the parent list containing the current list element and its direct siblings.
## apply f based on element's name
rrapply(
  renewable_oceania,
  condition = \(x) !is.na(x),
  f = \(x, .xname) sprintf("Renewable energy in %s: %.2f%%", .xname, x),
  how = "flatten"
) |>
  head(n = 5)

#>                                      Australia 
#>         "Renewable energy in Australia: 9.32%" 
#>                                    New Zealand 
#>      "Renewable energy in New Zealand: 32.76%" 
#>                                           Fiji 
#>             "Renewable energy in Fiji: 24.36%" 
#>                                  New Caledonia 
#>     "Renewable energy in New Caledonia: 4.03%" 
#>                               Papua New Guinea 
#> "Renewable energy in Papua New Guinea: 50.34%"

## filter elements by name 
rrapply(
  renewable_energy_by_country,
  condition = \(x, .xname) .xname %in% c("Belgium", "Netherlands", "Luxembourg"),
  how = "prune"
) |>
  str(give.attr = FALSE)

#> List of 1
#>  $ World:List of 1
#>   ..$ Europe:List of 1
#>   .. ..$ Western Europe:List of 3
#>   .. .. ..$ Belgium    : num 9.14
#>   .. .. ..$ Luxembourg : num 13.5
#>   .. .. ..$ Netherlands: num 5.78

## filter European countries > 50% using .xpos
rrapply(
  renewable_energy_by_country,
  condition = \(x, .xpos) identical(.xpos[1:2], c(1L, 5L)) && x > 50,
  how = "prune"
) |>
  str(give.attr = FALSE)

#> List of 1
#>  $ World:List of 1
#>   ..$ Europe:List of 2
#>   .. ..$ Northern Europe:List of 3
#>   .. .. ..$ Iceland: num 78.1
#>   .. .. ..$ Norway : num 59.5
#>   .. .. ..$ Sweden : num 51.4
#>   .. ..$ Western Europe :List of 1
#>   .. .. ..$ Liechtenstein: num 62.9

## filter European countries > 50% using .xparents
rrapply(
  renewable_energy_by_country, 
  condition = \(x, .xparents) "Europe" %in% .xparents && x > 50,
  how = "prune"
) |>
  str(give.attr = FALSE)

#> List of 1
#>  $ World:List of 1
#>   ..$ Europe:List of 2
#>   .. ..$ Northern Europe:List of 3
#>   .. .. ..$ Iceland: num 78.1
#>   .. .. ..$ Norway : num 59.5
#>   .. .. ..$ Sweden : num 51.4
#>   .. ..$ Western Europe :List of 1
#>   .. .. ..$ Liechtenstein: num 62.9

## return position of element in list
rrapply(
  renewable_energy_by_country,
  condition = \(x, .xname) .xname == "Sweden",
  f = \(x, .xpos) .xpos,
  how = "flatten"
)

#> $Sweden
#> [1]  1  5  2 14

## return siblings of element in list
rrapply(
  renewable_energy_by_country,
  condition = \(x, .xsiblings) "Sweden" %in% names(.xsiblings),
  how = "flatten"
) |>
  head(n = 5)

#> Aland Islands       Denmark       Estonia Faroe Islands       Finland 
#>            NA         33.06         26.55          4.24         42.03

## filter elements and unnest list  
rrapply(
  pokedex,
  condition = \(x, .xpos, .xname) length(.xpos) < 4 & .xname %in% c("num", "name", "type"),
  how = "bind"
) |>
  head()

#>   num       name          type
#> 1 001  Bulbasaur Grass, Poison
#> 2 002    Ivysaur Grass, Poison
#> 3 003   Venusaur Grass, Poison
#> 4 004 Charmander          Fire
#> 5 005 Charmeleon          Fire
#> 6 006  Charizard  Fire, Flying

Modifying list elements

By default, both base rapply() and rrapply() recurse into any “list-like” element. Set classes = "list" in rrapply() to override and apply f to any list element (i.e. a sublist) that satisfies the condition argument. This can be useful to e.g. collapse sublists or calculate summary statistics across elements in a nested list:

## calculate mean value of Europe
rrapply(
  renewable_energy_by_country,  
  condition = \(x, .xname) .xname == "Europe",
  f = \(x) mean(unlist(x), na.rm = TRUE),
  classes = "list",
  how = "flatten"
)

#>   Europe 
#> 22.36565

## calculate mean value for each continent
## (Antartica's value is missing)
rrapply(
  renewable_energy_by_country, 
  condition = \(x, .xpos) length(.xpos) == 2,
  f = \(x) mean(unlist(x), na.rm = TRUE),
  classes = "list"
) |>
  str(give.attr = FALSE)

#> List of 1
#>  $ World:List of 6
#>   ..$ Africa    : num 54.3
#>   ..$ Americas  : num 18.2
#>   ..$ Antarctica: logi NA
#>   ..$ Asia      : num 17.9
#>   ..$ Europe    : num 22.4
#>   ..$ Oceania   : num 17.8

## simplify pokemon evolutions to character vectors 
rrapply(
  pokedex,
  condition = \(x, .xname) .xname %in% c("name", "next_evolution", "prev_evolution"), 
  f = \(x) if(is.list(x)) sapply(x, `[[`, "name") else x,
  classes = c("character", "list"),
  how = "bind"
) |>
  head(n = 9)

#>         name        next_evolution         prev_evolution
#> 1  Bulbasaur     Ivysaur, Venusaur                     NA
#> 2    Ivysaur              Venusaur              Bulbasaur
#> 3   Venusaur                    NA     Bulbasaur, Ivysaur
#> 4 Charmander Charmeleon, Charizard                     NA
#> 5 Charmeleon             Charizard             Charmander
#> 6  Charizard                    NA Charmander, Charmeleon
#> 7   Squirtle  Wartortle, Blastoise                     NA
#> 8  Wartortle             Blastoise               Squirtle
#> 9  Blastoise                    NA    Squirtle, Wartortle

Hint: as data.frames are also list-like objects, rrapply() applies f to individual data.frame columns by default. Set classes = "data.frame" to avoid this behavior and apply the f and condition functions to complete data.frame objects instead of individual data.frame columns.

Recursive list updating

how = "recurse"

If classes = "list" and how = "recurse", rrapply() applies the f function to any list element that satisfies the condition argument similar to the previous section, but recurses further into any updated list-like element after application of f. This can be useful to e.g. recursively update the class or other attributes of all elements in a nested list:

## recursively remove all list attributes
rrapply(
  renewable_oceania,
  f = \(x) c(x),
  classes = c("list", "ANY"),
  how = "recurse"
) |>
  str(list.len = 3, give.attr = TRUE)

#> List of 1
#>  $ Oceania:List of 4
#>   ..$ Australia and New Zealand:List of 6
#>   .. ..$ Australia                        : num 9.32
#>   .. ..$ Christmas Island                 : logi NA
#>   .. ..$ Cocos (Keeling) Islands          : logi NA
#>   .. .. [list output truncated]
#>   ..$ Melanesia                :List of 5
#>   .. ..$ Fiji            : num 24.4
#>   .. ..$ New Caledonia   : num 4.03
#>   .. ..$ Papua New Guinea: num 50.3
#>   .. .. [list output truncated]
#>   ..$ Micronesia               :List of 8
#>   .. ..$ Guam                                : num 3.03
#>   .. ..$ Kiribati                            : num 45.4
#>   .. ..$ Marshall Islands                    : num 11.8
#>   .. .. [list output truncated]
#>   .. [list output truncated]

how = "names"

The option how = "names" is a special case of how = "recurse", where the value of f is used to replace the name of the evaluated list element instead of its content (as with all other how options). By default, how = "names" uses classes = c("list", "ANY") in order to allow updating of all names in the nested list.

## recursively replace all names by M49-codes
rrapply(
  renewable_oceania,
  f = \(x) attr(x, "M49-code"),
  how = "names"
) |>
  str(list.len = 3, give.attr = FALSE)

#> List of 1
#>  $ 009:List of 4
#>   ..$ 053:List of 6
#>   .. ..$ 036: num 9.32
#>   .. ..$ 162: logi NA
#>   .. ..$ 166: logi NA
#>   .. .. [list output truncated]
#>   ..$ 054:List of 5
#>   .. ..$ 242: num 24.4
#>   .. ..$ 540: num 4.03
#>   .. ..$ 598: num 50.3
#>   .. .. [list output truncated]
#>   ..$ 057:List of 8
#>   .. ..$ 316: num 3.03
#>   .. ..$ 296: num 45.4
#>   .. ..$ 584: num 11.8
#>   .. .. [list output truncated]
#>   .. [list output truncated]

Expression objects

Base rapply() does not include recursion for expression objects. In contrast rrapply() supports recursion of call objects and expression vectors, which are treated as nested lists based on their abstract syntax trees. As such, all functionality that applies to nested lists extends directly to call objects and expression vectors.

To update the abstract syntax tree of a call object, use how = "replace":

## language object
(lang <- quote(y <- x <- 1 + TRUE))

#> y <- x <- 1 + TRUE

## replace logicals by integers 
rrapply(lang, classes = "logical", f = as.numeric, how = "replace")

#> y <- x <- 1 + 1

To update the abstract syntax tree and return it as a nested list, use how = "list":

## update and decompose call object
rrapply(lang, f = function(x) ifelse(is.logical(x), as.numeric(x), x), how = "list") |>
  str()

#> List of 3
#>  $ : symbol <-
#>  $ : symbol y
#>  $ :List of 3
#>   ..$ : symbol <-
#>   ..$ : symbol x
#>   ..$ :List of 3
#>   .. ..$ : symbol +
#>   .. ..$ : num 1
#>   .. ..$ : num 1

The modes how = "prune", how = "flatten" and how = "melt" return the pruned abstract syntax tree as: a nested list, a flattened list and a melted data.frame respectively. This is identical to application of rrapply() to the abstract syntax tree formatted as a nested list.

To illustrate, we return all names (i.e. symbols) in the abstract syntax tree that not part of base R:

## expression vector
expr <- expression(y <- x <- 1, f(g(2 * pi)))
is_new_name <- function(x) !exists(as.character(x), envir = baseenv())

## prune and decompose expression
rrapply(expr, classes = "name", condition = is_new_name, how = "prune") |>
  str()

#> List of 2
#>  $ :List of 2
#>   ..$ : symbol y
#>   ..$ :List of 1
#>   .. ..$ : symbol x
#>  $ :List of 2
#>   ..$ : symbol f
#>   ..$ :List of 1
#>   .. ..$ : symbol g

## prune and flatten expression
rrapply(expr, classes = "name", condition = is_new_name, how = "flatten") |>
  str()

#> List of 4
#>  $ : symbol y
#>  $ : symbol x
#>  $ : symbol f
#>  $ : symbol g

## prune and melt expression
rrapply(expr, classes = "name", condition = is_new_name, f = as.character, how = "melt")

#>   L1 L2   L3 value
#> 1  1  2 <NA>     y
#> 2  1  3    2     x
#> 3  2  1 <NA>     f
#> 4  2  2    1     g

For more details and examples on how to use the rrapply() function see the accompanying package vignette in the vignettes folder or the Articles section at https://jorischau.github.io/rrapply/.