nudomarinero/wquantiles

Weighted median appears to be incorrect

Closed this issue · 4 comments

The weighted median does not appear to be correct, e.g.:

>import weighted as w
>w.median(array([1,2,3]),array([100,1,1]))
1.0198019801980198

Shouldn't the weighted median be an element of the data array:
https://en.wikipedia.org/wiki/Weighted_median

Is there another definition of weighted median?

I suppose that it depends on the definition. A "standard" (non-weighted) median can be usually: a) one value of the distribution or, b) the mean of the two middle values (https://en.wikipedia.org/wiki/Median). Extending this last case, when the value of the weighted median is between two values of the distribution (this is very common), a weighted mean is taken between the two values. This fits with a view of the weights and values of the array defining a continuum probability density function from which the median is extracted.
I suppose that a version that uses discrete values of the distribution (or the mean of two values in some specific cases) could be implemented but I do not know if this would be useful.

The view of the weighted median as a pdf and the corresponding formulas can be found in https://en.wikipedia.org/wiki/Percentile#Weighted_percentile

Thank you for the quick response and link to the definition of weighted percentile.

@nudomarinero I was also confused with results from your implementation, and I was about to open an issue but then I saw this similar issue which I was about to open. Here are my investigations, comparing wquantiles with another package, robustats: spinalcordtoolbox/spinalcordtoolbox#3329 (comment)

If I may suggest, to avoid confusion for future users, you might want to consider putting up front in your readme that you are computing the median using the weighted percentile method, and that gives different results than the traditional (discrete) percentile implementation. In any case, thank you very much for providing this resource to the community!