HolmBonferroni correction generates error with certain p-value lists
RossLabCSU opened this issue · 3 comments
I am encountering an error for certain sets of p-values derived from GO-term analyses. I have traced the error to the HolmBonferroni correction method, but I can't quite figure out the most appropriate modification to GOAtools to handle the issue properly. Below is a minimal example, and the resulting error:
from goatools import multiple_testing
# WORKS
pvals = [0.24, 1.0, 1.0, 1.0, 1.0]
hb_obj = multiple_testing.HolmBonferroni(pvals)
print(hb_obj.corrected_pvals)
# GENERATES ValueError
pvals = [0.25, 1.0, 1.0, 1.0, 1.0]
hb_obj = multiple_testing.HolmBonferroni(pvals)
print(hb_obj.corrected_pvals)
# GENERATES ValueError
# pvals = [1.0, 1.0, 1.0, 1.0]
# hb_obj = multiple_testing.HolmBonferroni(pvals)
# print(hb_obj.corrected_pvals)
Error:
hb_obj = multiple_testing.HolmBonferroni(pvals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File goatools\multiple_testing.py, line 158, in __init__
self.set_correction()
File goatools\multiple_testing.py, line 207, in set_correction
idxs, correction = list(zip(*self._generate_significant()))
^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 0)
The error seems to depend both on the number of p-values and the lowest p-value. I am encountering this mostly for small sets of proteins and proteomes, but it does show up repeatedly in my dataset.
Any help would be appreciated! Thank you.
After a bit more testing, I think I have a possible solution that may be a conceptual improvement as well. If I'm understanding the code correctly, the HolmBonferroni was only applying a correction to p-values that were below their respective nominal alpha thresholds (determined by the number of p-values and their position in a sorted list). However, it seems like a correction should be applied to all p-values less than 1 (presumably making them equal 1 after correction).
To accomplish this, it seems like changing one line in the _generate_significant(self)
method of the HolmBonferroni
class in multiple_testing.py from
if p * 1. / num_pvals < self.a:
to
if p < 1.:
would properly adjust all p-values less than 1. Of course, this would not technically "generate significant" p-values: but it would apply the correction to all non-1 p-values.
When I make this adjustment in the GOATOOLS code locally, the test cases described above (and other test cases) no longer result in ValueErrors and appear to generate the appropriate corrected p-values.
Ah my solution above still fails in situations where all p-values equal 1. Perhaps the following additional modification to handle this corner case:
if all(p==1. for p in self.pvals):
idxs, correction = [(0,), (1,)]
else:
idxs, correction = list(zip(*self._generate_significant()))
Apologies if these are "inelegant" solutions!
Thank you for the deep dive on this. Would you be able to do a pull request on this improvement? This will be super helpful.
Haibao