Misleading names using `pretty_names` in combination with `cut()`

Question

Misleading names using `pretty_names` in combination with `cut()`

snhansen opened this issue a year ago · 3 comments

Consider this example:

mtcars |>
  dplyr::mutate(grp = cut(mpg, breaks = c(0,18,20,50))) |>
  lm(wt ~ grp, data = _) |>
  parameters::parameters()
#> Parameter   | Coefficient |   SE |         95% CI | t(29) |      p
#> ------------------------------------------------------------------
#> (Intercept) |        4.01 | 0.18 | [ 3.64,  4.38] | 22.12 | < .001
#> grp [19-20] |       -0.62 | 0.34 | [-1.32,  0.08] | -1.80 | 0.082 
#> grp [21-50] |       -1.59 | 0.25 | [-2.11, -1.08] | -6.33 | < .001

I think the ranges given here in the square brackets are misleading. It's not clear in which group 20.5 would go. In my opinion, they should be consistent with pretty_names = "labels":

mtcars |>
  dplyr::mutate(grp = cut(mpg, breaks = c(0,18,20,50))) |>
  lm(wt ~ grp, data = _) |>
  parameters::parameters() |>
  print(pretty_names = "labels")
#> Parameter     | Coefficient |   SE |         95% CI | t(29) |      p
#> --------------------------------------------------------------------
#> (Intercept)   |        4.01 | 0.18 | [ 3.64,  4.38] | 22.12 | < .001
#> grp [(18,20]] |       -0.62 | 0.34 | [-1.32,  0.08] | -1.80 | 0.082 
#> grp [(20,50]] |       -1.59 | 0.25 | [-2.11, -1.08] | -6.33 | < .001

Answer 1 · 2024-02-06T20:47:32.000Z

What about this?

mtcars |>
  datawizard::data_modify(grp = cut(mpg, breaks = c(0, 15, 20, 50))) |>
  lm(wt ~ grp, data = _) |>
  parameters::parameters()
#> Parameter    | Coefficient |   SE |         95% CI | t(29) |      p
#> -------------------------------------------------------------------
#> (Intercept)  |        4.50 | 0.24 | [ 4.01,  4.99] | 18.91 | < .001
#> grp [>15-20] |       -0.99 | 0.29 | [-1.59, -0.40] | -3.40 | 0.002 
#> grp [>20-50] |       -2.08 | 0.28 | [-2.66, -1.50] | -7.32 | < .001
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald t-distribution approximation.

^{Created on 2024-02-06 with reprex v2.1.0}

Answer 2 · 2024-02-06T23:47:49.000Z

That looks good to me, and it's easier to understand than the distinction between [ and ( ranges.

Answer 3 · 2024-02-07T07:15:54.000Z

Yes, this is great. I like the solution.