The goal of varnames
is to help with creating consistent variable
names. Good variable names have many advantages, particularly when
managing larger datasets.
Consistent names make also make it easier to transform data sets to “tidy data”, see Wickham (2013). I’ll show some examples how consistent variable names can be really powerful because it’s easier to automate things and use Regular Expressions.
This is work in progress, the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("milanwiedemann/varnames")
For most cases there might be a very easy way to create consistent
variable names simply by using the paste()
function. This function
comes with base R which means that no further packages are needed and it
is very flexible!
# Create vector of a measure with 3 time points (t)
paste("measure_", "t", 1:3, sep = "")
#> [1] "measure_t1" "measure_t2" "measure_t3"
Sometimes more than one specification need to be added to a variable name. This is still possible using base function but it gets a bit tricky so I build functions that help with this.
Similar to the example above, the create_var_names()
function creates
a vector (or list) of a measure with 3 time points (“t”) specified in
the arguments str = "t"
and n = 3
.
library(varnames)
library(tidyverse)
#> ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
#> ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
#> ✓ tibble 3.0.1 ✓ dplyr 0.8.5
#> ✓ tidyr 1.0.2 ✓ stringr 1.4.0
#> ✓ readr 1.3.1 ✓ forcats 0.5.0
#> ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
# Create vector of a measure with 3 time points (t)
create_var_names(var_name = "measure", str = "t", n = 3, unlist = T)
#> [1] "measure_t1" "measure_t2" "measure_t3"
The add_specifier()
function adds a further specification to the
variables created in the step before, here "i"
for item. Depending on
what output format is most suitable for further use, this function can
return the output in different ways.
The example below adds two items “i” for each time point “t” and returns
a list that is sorted by the output format from the previous function
using sort = "previous"
. This creates a list with 3 vectors, one for
each time point specified in the previous function.
create_var_names(var_name = "measure", str = "t", n = 3) %>%
add_specifier(str = "i", n = 2, sort = "previous")
#> [[1]]
#> [1] "measure_t1_i1" "measure_t1_i2"
#>
#> [[2]]
#> [1] "measure_t2_i1" "measure_t2_i2"
#>
#> [[3]]
#> [1] "measure_t3_i1" "measure_t3_i2"
The output list can also be sorted by the new specifyer that was added
by using the argument sort = "current"
. Now a list with two vectors is
returned, one vector for each indicator that was added.
create_var_names(var_name = "measure", str = "t", n = 3) %>%
add_specifier(str = "i", n = 2, sort = "current")
#> [[1]]
#> [1] "measure_t1_i1" "measure_t2_i1" "measure_t3_i1"
#>
#> [[2]]
#> [1] "measure_t1_i2" "measure_t2_i2" "measure_t3_i2"
A list can be returned using the argument unlist = TRUE
.
create_var_names(var_name = "measure", str = "t", n = 3) %>%
add_specifier(str = "i", n = 2, sort = "current", unlist = TRUE)
#> [1] "measure_t1_i1" "measure_t2_i1" "measure_t3_i1" "measure_t1_i2"
#> [5] "measure_t2_i2" "measure_t3_i2"
The order of the vector can still be changed when using unlist = TRUE
by changing the sort argument.
create_var_names(var_name = "measure", str = "t", n = 3) %>%
add_specifier(str = "i", n = 2, sort = "previous", unlist = F)
#> [[1]]
#> [1] "measure_t1_i1" "measure_t1_i2"
#>
#> [[2]]
#> [1] "measure_t2_i1" "measure_t2_i2"
#>
#> [[3]]
#> [1] "measure_t3_i1" "measure_t3_i2"
- Say something about “tidy” data Wickham2013
- Take a closer look at the bad data guide https://github.com/Quartz/bad-data-guide
- In the examples below I’m showing how data stored in these variable names can be transformend into “long” data.
- Add Why is this so useful? Show some examples, regular expressions automation etc
- Add rename functions
- Currently this only works when adding one specifyer, I think more would be cool, I might need help with this