Mann-Whitney U (mwu): the computation of rank-biserial correlation (RBC) is problematic
mmpeng9 opened this issue · 1 comments
Hi there,
I found that the computation of rank-biserial correlation (RBC) is problematic. This is related to #417 and #424.
According to the cited paper, there are three ways to compute RBC. It seems you adopted the third method based on Hans Wendt (1972): r=1 – (2U)/ (n1 * n2). From the paper, U is the smaller number between U1 and U2:
Finding the test statistic U requires two steps. First, compute the number of favorable and unfavorable pairs; or what is the same thing, compute U1 and U2, as defined in Equations 1 and 2. Second, select the smaller of the two numbers; this smaller number is the test statistic U.
According to SciPy,
the Mann-Whitney U statistic corresponding with sample x; If U1 is the statistic corresponding with sample x, then the statistic corresponding with sample y is U2 = x.shape[axis] * y.shape[axis] - U1.
It seems that the returned U
is not the smaller one in U1
and U2
. And, it will result in a RBC value that is negative (according to the paper, this should always be positive). This is also demonstrated in my experiments.
Thanks!