JorisChau/rrapply

Option to remove prefix with how="bind"

psads-git opened this issue · 4 comments

Take your own example:

pokedex_wide <- rrapply(pokedex, how = "bind")

The colnames of the resulting dataframe have all the prefix pokemon. It would be useful to have an easy way to remove the prefix.

Thanks in advance!

In the example, pokemon is the name of the first list layer. To remove the pokemon prefix, you can just index into the first element of the list:

library(rrapply)
data("pokedex")
pokedex_wide <- rrapply(pokedex[[1]], how = "bind")
head(pokedex_wide[, 1:5], n = 5)
#>   id num       name                                              img
#> 1  1 001  Bulbasaur http://www.serebii.net/pokemongo/pokemon/001.png
#> 2  2 002    Ivysaur http://www.serebii.net/pokemongo/pokemon/002.png
#> 3  3 003   Venusaur http://www.serebii.net/pokemongo/pokemon/003.png
#> 4  4 004 Charmander http://www.serebii.net/pokemongo/pokemon/004.png
#> 5  5 005 Charmeleon http://www.serebii.net/pokemongo/pokemon/005.png
#>            type
#> 1 Grass, Poison
#> 2 Grass, Poison
#> 3 Grass, Poison
#> 4          Fire
#> 5          Fire

I am not sure there is much added value in having a dedicated option for removing this prefix, or is there something I am missing?

Thanks, Joris, for your answer. If one takes the example below, I guess it is not that easy to remove value.:

library(tidyverse)
library(rrapply)

eg <- tibble(user_id = c("10001", "10002"),
             data = c("{'key': 'age', 'value': {'max': 40, 'min': 31}}", 
                      "{'key': 'age', 'value': {'max': 30, 'min': 21}}"))

map(gsub("'", "\"", eg$data), jsonlite::fromJSON) %>% 
  rrapply(condition = \(x, .xname) .xname %in% c("min", "max"), how="bind")

#>   value.max value.min
#> 1        40        31
#> 2        30        21

@psads-git: this has been addressed in release v1.2.5.

The data.frame columns in how = "bind" now only concatenate child list names instead of full path names. In your previous example, it depends which depth layer is used to transform to individual data.frame columns. By default, this is the minimal depth across leaf nodes (i.e. key/value layer):

library(rrapply)

eg <- list(
  list(key = "age", value = list(max = 40L, min = 31L)), 
  list(key = "age", value = list(max = 30L, min = 21L))
)

str(eg)
#> List of 2
#>  $ :List of 2
#>   ..$ key  : chr "age"
#>   ..$ value:List of 2
#>   .. ..$ max: int 40
#>   .. ..$ min: int 31
#>  $ :List of 2
#>   ..$ key  : chr "age"
#>   ..$ value:List of 2
#>   .. ..$ max: int 30
#>   .. ..$ min: int 21

## bind at depth 2
rrapply(eg, how = "bind")
#>   key value.max value.min
#> 1 age        40        31
#> 2 age        30        21

In this case columns names start from the names key and value....

If we bind child lists at the (deeper) list layer of min and max, the column names will start from min... and max...:

## bind at depth 3
rrapply(eg, how = "bind", options = list(coldepth = 3))
#>   max min
#> 1  40  31
#> 2  30  21

The parent list names can still be added to the wide data.frame as individual columns by setting options = list(namecols = TRUE):

## bind at depth 3 + include name columns
rrapply(eg, how = "bind", options = list(coldepth = 3, namecols = TRUE))
#>   L1    L2 max min
#> 1  1 value  40  31
#> 2  2 value  30  21

Thanks a lot, Joris! It is great now: A very nice improvement!