strengejacke/sjmisc

rec function repeats labels unnecessarily

phildias opened this issue · 2 comments

When the recode function (sjmisc::rec) is used to recode variables and simplify categories, there are cases where the labels attribute contains some unnecessary duplicates. See the example below.

test_vec = sample(letters[1:5],replace = T, size = 50)

rec_sj = sjmisc::rec(test_vec, 
                     rec = "a = 1 [Test 1];
                            b = 1 [Test 1];
                            c = 2 [Test 2];
                            d = 2 [Test 2];
                            e = 3 [Test 3]")

sjmisc::frq(x = rec_sj)

This produces the following output:

x <numeric>
# total N=87  valid N=87  mean=1.94  sd=0.77

 val  label frq raw.prc valid.prc cum.prc
   1 Test 1  16   18.39     18.39   18.39
   1 Test 1  16   18.39     18.39   36.78
   2 Test 2  21   24.14     24.14   60.92
   2 Test 2  21   24.14     24.14   85.06
   3 Test 3  13   14.94     14.94  100.00
  NA   <NA>   0    0.00        NA      NA

Notice how, instead of the output having only 3 new labels, it contains 5., some of which are duplicated.

Yes, usually it's intended to do it like this:

library(sjmisc)
test_vec <- sample(letters[1:5], replace = T, size = 50)

test_vec %>% 
  rec(rec = "a,b = 1 [Test 1]; c,d = 2 [Test 2]; e = 3 [Test 3]") %>% 
  frq()
#> 
#> x <numeric>
#> # total N=50  valid N=50  mean=1.76  sd=0.74
#> 
#>  val  label frq raw.prc valid.prc cum.prc
#>    1 Test 1  21      42        42      42
#>    2 Test 2  20      40        40      82
#>    3 Test 3   9      18        18     100
#>   NA   <NA>   0       0        NA      NA

Created on 2019-11-10 by the reprex package (v0.3.0)

But maybe I can see if I check the input for "duplicated" recodes.

As this case might be not intended due to a typo, I decided not to internally "change" the recode-pattern, but instead throw a warning.