rec function repeats labels unnecessarily
phildias opened this issue · 2 comments
phildias commented
When the recode function (sjmisc::rec
) is used to recode variables and simplify categories, there are cases where the labels
attribute contains some unnecessary duplicates. See the example below.
test_vec = sample(letters[1:5],replace = T, size = 50)
rec_sj = sjmisc::rec(test_vec,
rec = "a = 1 [Test 1];
b = 1 [Test 1];
c = 2 [Test 2];
d = 2 [Test 2];
e = 3 [Test 3]")
sjmisc::frq(x = rec_sj)
This produces the following output:
x <numeric>
# total N=87 valid N=87 mean=1.94 sd=0.77
val label frq raw.prc valid.prc cum.prc
1 Test 1 16 18.39 18.39 18.39
1 Test 1 16 18.39 18.39 36.78
2 Test 2 21 24.14 24.14 60.92
2 Test 2 21 24.14 24.14 85.06
3 Test 3 13 14.94 14.94 100.00
NA <NA> 0 0.00 NA NA
Notice how, instead of the output having only 3 new labels, it contains 5., some of which are duplicated.
strengejacke commented
Yes, usually it's intended to do it like this:
library(sjmisc)
test_vec <- sample(letters[1:5], replace = T, size = 50)
test_vec %>%
rec(rec = "a,b = 1 [Test 1]; c,d = 2 [Test 2]; e = 3 [Test 3]") %>%
frq()
#>
#> x <numeric>
#> # total N=50 valid N=50 mean=1.76 sd=0.74
#>
#> val label frq raw.prc valid.prc cum.prc
#> 1 Test 1 21 42 42 42
#> 2 Test 2 20 40 40 82
#> 3 Test 3 9 18 18 100
#> NA <NA> 0 0 NA NA
Created on 2019-11-10 by the reprex package (v0.3.0)
But maybe I can see if I check the input for "duplicated" recodes.
strengejacke commented
As this case might be not intended due to a typo, I decided not to internally "change" the recode-pattern, but instead throw a warning.