/wilcoxon-exact

Exact p-values for Wilcoxon signed-rank test in polynomial time.

Primary LanguagePython

Quick Start

Compute exact p-values for the Wilcoxon signed-rank test for small sample sizes in polynomial time. Requires numpy and scipy.

Usage

>>> import numpy as np
>>> from wilcoxon_exact import wilcoxon_exact
>>> x = np.array([1.83,  0.50,  1.62,  2.48, 1.68, 1.88, 1.55, 3.06, 1.30])
>>> y = np.array([0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29])
>>> wilcoxon_exact(x, y, alternative="greater")
(35.0, 0.01953125)

>>> x = np.array([-6.1, 4.3, 7.2, 8.0, -2.1])
>>> wilcoxon_exact(x, alternative="two-sided")
(7.0, 0.4375)

Background

SciPy's implementation of the Wilcoxon signed-rank test defaults to a normal approximation, while R computes an exact p-value for small sample sizes. While the normal approximations reasonable for sample sizes greater than (approximately) 20, exact p-values are preferable when they are available. I could not find a Python implementation of exact p-values for the Wilcoxon test, and did not want to have to call R for a single statistical test, so I implemented one here.

This code computes exact p-values by recursively solving for the coefficients of the moment generating function. The implementation is tested against exact p-values generated by R.

In Python:

>>> from wilcoxon_exact import wilcoxon_exact
>>> np.random.seed(100)
>>> x = np.random.normal(size=10)
array([-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604,  0.98132079,
        0.51421884,  0.22117967, -1.07004333, -0.18949583,  0.25500144])
>>> y = np.random.normal(size=10)
array([-0.45802699,  0.43516349, -0.58359505,  0.81684707,  0.67272081,
       -0.10441114, -0.53128038,  1.02973269, -0.43813562, -1.11831825])
>>> wilcoxon_exact(x, y, alternative="greater")
(7.0, 0.384765625)

In R:

> options(digits = 10) # print more significant digits
> x <- c(-1.74976547,  0.3426804 ,  1.1530358 , -0.25243604,  0.98132079,
          0.51421884,  0.22117967, -1.07004333, -0.18949583,  0.25500144)
> y <- c(-0.45802699,  0.43516349, -0.58359505,  0.81684707,  0.67272081,
         -0.10441114, -0.53128038,  1.02973269, -0.43813562, -1.11831825)
> res <- wilcox.test(x, y, paired=TRUE, exact=TRUE, alternative="greater")
> res$statistic
 V 
31
> res$p.value
[1] 0.384765625

Run the code directly to run more tests

bash$ python wilcoxon_exact.py

Note that the test statistics are different. This is because there are a few different versions of the Wilcoxon statistic. The sum of the signed ranks is reported by wilcoxon_exact, since the underlying implementation computes the pmf of this sum. SciPy reports the sum of the ranks either below or above zero, while R reports "the sum of the ranks of the first sample with the minimum value subtracted."

Documentation

def wilcoxon_exact(x, y=None, alternative="two-sided"):
    """
    Calculate the Wilcoxon signed-rank test statistic and exact p-values.
    
    Given matched samples, x_i and y_i, the Wilcoxon signed-rank test tests the
    null that x_i - y_i is symmetric around zero. In practice, it is used to test
    whether x_i and y_i are from the same population with different location 
    parameters.

    There are several different versions of the test statistic. The one used here
    is
        T = sign(z_1) R|z_1| + ... + sign(z_n) R|z_n|
    where
        z_i = x_i - y_i       if y_i is specified
        z_i = x_i             otherwise.

    The pmf has no closed form, but for small sample sizes it is possible to compute
    the pmf by solving for the coefficients of the moment generating function.

    Parameters
    ----------
        x : array_like
            The first set of data values (if y is specified) or the difference
            between two sets of data values
        y : array_like optional
            If specified, the difference x - y is used for the test statistic
        alternative : {"two-sided", "greater", "less"}, optional
            The alternative hypothesis tested.
            If "two-sided", the test is
                x_i - y_i > 0 or x_i - y_i < 0
            If "greater", the test it
                x_i - y_i > 0
            If "less", the test is
                x_i - y_i < 0

    Returns
    -------
        T : float
            The test-statistic.
        p : float
            The p-value.


    Examples
    --------
    >>> import numpy as np
    >>> from wilcoxon_exact import wilcoxon_exact
    >>> x = np.array([1.83,  0.50,  1.62,  2.48, 1.68, 1.88, 1.55, 3.06, 1.30])
    >>> y = np.array([0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29])
    >>> wilcoxon_exact(x, y, alternative="greater")
    (35.0, 0.01953125)

    >>> x = np.array([-6.1, 4.3, 7.2, 8.0, -2.1])
    >>> wilcoxon_exact(x, alternative="two-sided")
    (7.0, 0.4375)
    """