Compute exact p-values for the Wilcoxon signed-rank test for small sample sizes in polynomial time. Requires numpy and scipy.
Usage
>>> import numpy as np
>>> from wilcoxon_exact import wilcoxon_exact
>>> x = np.array([1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30])
>>> y = np.array([0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29])
>>> wilcoxon_exact(x, y, alternative="greater")
(35.0, 0.01953125)
>>> x = np.array([-6.1, 4.3, 7.2, 8.0, -2.1])
>>> wilcoxon_exact(x, alternative="two-sided")
(7.0, 0.4375)
SciPy's implementation of the Wilcoxon signed-rank test defaults to a normal approximation, while R computes an exact p-value for small sample sizes. While the normal approximations reasonable for sample sizes greater than (approximately) 20, exact p-values are preferable when they are available. I could not find a Python implementation of exact p-values for the Wilcoxon test, and did not want to have to call R for a single statistical test, so I implemented one here.
This code computes exact p-values by recursively solving for the coefficients of the moment generating function. The implementation is tested against exact p-values generated by R.
In Python:
>>> from wilcoxon_exact import wilcoxon_exact
>>> np.random.seed(100)
>>> x = np.random.normal(size=10)
array([-1.74976547, 0.3426804 , 1.1530358 , -0.25243604, 0.98132079,
0.51421884, 0.22117967, -1.07004333, -0.18949583, 0.25500144])
>>> y = np.random.normal(size=10)
array([-0.45802699, 0.43516349, -0.58359505, 0.81684707, 0.67272081,
-0.10441114, -0.53128038, 1.02973269, -0.43813562, -1.11831825])
>>> wilcoxon_exact(x, y, alternative="greater")
(7.0, 0.384765625)
In R:
> options(digits = 10) # print more significant digits
> x <- c(-1.74976547, 0.3426804 , 1.1530358 , -0.25243604, 0.98132079,
0.51421884, 0.22117967, -1.07004333, -0.18949583, 0.25500144)
> y <- c(-0.45802699, 0.43516349, -0.58359505, 0.81684707, 0.67272081,
-0.10441114, -0.53128038, 1.02973269, -0.43813562, -1.11831825)
> res <- wilcox.test(x, y, paired=TRUE, exact=TRUE, alternative="greater")
> res$statistic
V
31
> res$p.value
[1] 0.384765625
Run the code directly to run more tests
bash$ python wilcoxon_exact.py
Note that the test statistics are different. This is because there are a few different versions of the Wilcoxon statistic. The sum of the signed ranks is reported by wilcoxon_exact
, since the underlying implementation computes the pmf of this sum. SciPy reports the sum of the ranks either below or above zero, while R reports "the sum of the ranks of the first sample with the minimum value subtracted."
def wilcoxon_exact(x, y=None, alternative="two-sided"):
"""
Calculate the Wilcoxon signed-rank test statistic and exact p-values.
Given matched samples, x_i and y_i, the Wilcoxon signed-rank test tests the
null that x_i - y_i is symmetric around zero. In practice, it is used to test
whether x_i and y_i are from the same population with different location
parameters.
There are several different versions of the test statistic. The one used here
is
T = sign(z_1) R|z_1| + ... + sign(z_n) R|z_n|
where
z_i = x_i - y_i if y_i is specified
z_i = x_i otherwise.
The pmf has no closed form, but for small sample sizes it is possible to compute
the pmf by solving for the coefficients of the moment generating function.
Parameters
----------
x : array_like
The first set of data values (if y is specified) or the difference
between two sets of data values
y : array_like optional
If specified, the difference x - y is used for the test statistic
alternative : {"two-sided", "greater", "less"}, optional
The alternative hypothesis tested.
If "two-sided", the test is
x_i - y_i > 0 or x_i - y_i < 0
If "greater", the test it
x_i - y_i > 0
If "less", the test is
x_i - y_i < 0
Returns
-------
T : float
The test-statistic.
p : float
The p-value.
Examples
--------
>>> import numpy as np
>>> from wilcoxon_exact import wilcoxon_exact
>>> x = np.array([1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30])
>>> y = np.array([0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29])
>>> wilcoxon_exact(x, y, alternative="greater")
(35.0, 0.01953125)
>>> x = np.array([-6.1, 4.3, 7.2, 8.0, -2.1])
>>> wilcoxon_exact(x, alternative="two-sided")
(7.0, 0.4375)
"""