dcomtois/summarytools

descr does not calculate statistics (e.g. min, max) correctly if the column names contain exactly the same postfixes as the statistics function string (e.g. "column_min" or "column_max")

yenchiayi opened this issue · 0 comments

I have a small data.frame with dimension = (2, 3) as follows:

column0 column1 column2
1 11 21
2 12 22

The descr function calculates everything correctly if I set column names as c("x", "x_1", "x_2"):

df <- data.frame(
  x = 1:2,
  x_1 = 11:12,
  x_2 = 21:22
)
df %>% 
  summarytools::descr(stats = c( "min", "max", "n.valid", "skewness", "kurtosis")) 
x x_1 x_2
Min 1.00 11.00 21.00
Max 2.00 12.00 22.00
N.Valid 2.00 2.00 2.00
Skewness 0.00 0.00 0.00
Kurtosis -2.75 -2.75 -2.75

However, if I set column names as c("x", "x_min", "x_max"), then descr does not calculate minimum and maximum (as well as other statistics like "n.valid", "skewness", and "kurtosis" ) correctly.

df <- data.frame(
  x = 1:2,
  x_min = 11:12,
  x_max = 21:22
)
df %>% 
  summarytools::descr(stats = c( "min", "max", "n.valid", "skewness", "kurtosis"))

As seen in below output, the Min of column 2 (x_max) is even larger than its Max. Other statistics like N.Valid, "Skewness", and "Kurtosis" are also wrong for the column "x_max" and "x_min".

x x_max x_min
Min 1.00 21 1
Max 2.00 2 1
N.Valid 2.00 1 1
Skewness 0.00 NA NA
Kurtosis -2.75 NA NA

My preliminary guess is that the the program may fail to distinguish the column name postfix (e.g. x_min) and the function name (e.g. min). I found that this issue arises around line 367-373 In descr.R. You may check this and see what happens.

image

Thanks!