Getting incorrect results when using stby and descr with weights
CaraghS opened this issue · 3 comments
I am using stby in summary tools to calculated weighted descriptive statistics by group. However, when I do this I am getting a different answer compared to when I filter by grouping variable and then apply the descr function in summary tools. See below - mydf = my unfiltered dataframe, score is a 0-10 variable that I want to get the mean of.
##when I filter first and split my df
filtered_male <- mydf$gender %>% filter(gender==1)
with(filtered_male, stby(score, gender, descr, weights = weight))
Weighted Descriptive Statistics
score by gender
Data Frame: filtered_male
Weights: weight
N: 838
1
Mean 6.86
Std.Dev 2.93
Min 0.00
Median 8.00
Max 10.00
MAD 2.97
CV 0.43
N.Valid 1509584.07
Pct.Valid 99.70
##when I don't split my df
with(mydf, stby(score, gender, descr, weights = weight, simplify = TRUE))
Weighted Descriptive Statistics
score by gender
Data Frame: mydf
Weights: weight
N: 838
1 2
Mean 7.01 6.79
Std.Dev 2.81 3.02
Min 0.00 0.00
Median 8.00 8.00
Max 10.00 10.00
MAD 2.97 2.97
CV 0.40 0.45
N.Valid 1715494.12 1379339.65
Pct.Valid 56.05 45.07
'''
Any idea's on why this is happening or how I fix it to get the correct weighted mean? (I've check the answers manually and the mean where I filter first is correct). Also, this doesn't seem to be an issue when I don't use weights.
Can I also add - the weighted median reported appears to be incorrect - it is different to that calculated using other R packages.
Thanks for reporting this... I know it's been a while, but if you had a reprex it would be very helpful to be able to investigate properly what's going on.
Fixed in dev-current