Multiple tests correction
Closed this issue · 14 comments
Hi Nic, cool stuff.
Having a quick read through it and I think it could be useful to correct the alpha level for multiple testing (as you may have panels with hundreds of SNVs).
Another question, why don't you calculate p_val = Bin(nv | c, pi/2) and then look if p_val < alpha (this would give you an idea of the magnitude at which the distribution differs from your null hypothesis)?
Cheers.
My take on these 2:
-
that's possibile, and it should be arranged. @nicola-calonaci I would divide the required
alpha
(model input) by the number of tested mutations so to adjust for FWER a-la-bonferroni. -
that is not the p-value though no? So
$p_{val} < alpha$ is not a test. The magnitude is OK.
Yea nono sorry, of course, the p-value it's not that expression, my fault, it's
While yea, comparing alpha to the p-value is what you usually do in hypothesis testing I think
I think that the tests
and
are equivalent (and same for the right-sided test) and that the former was implemented just to make the plotting function easier. We can definetly switch back to binom.test R function and implement the Bonferroni or BH corrections. Btw binom.test returns the extrema of the chosen confidence interval too so we could keep the same format for the plotting function.
Definitely, they are exactly the same.
My only point is that having a p-value gives you a more interpretable measure of how far you are from your
For ex. if I tell you that the p-value is 1e5 you know that this corresponds to a very low likelihood of observing the data under H0 (1 in 10000 trials), while if you tell me that the difference of
But the test of the quantiles is correct as it is of course. My point is just that it would be useful to report also the p-values
But what we do when we compare
is the same thing as reverting a p-value no? We are asking what is the quantile at which we observe with
Maybe @nicola-calonaci and @Militeee can you also associate a p-value to the test? That might be easy to communicate even though right now the user has to specify the input alpha
.
Just made the run_classifier
function return pvalues for subclonal and loh tests, adjusted for the Benjamini-Hochberg correction.
By default, I implemented it such that the BH correction takes into account the whole dataset.
Do you agree on leaving to the user the choice of correcting p-values by sample or by gene across samples? I think it depends on how one specifically wants to perform the classification.
But does it make sense to adjust for the whole cohort? I don't think so honestly.
I think we should use an adjustment per sample (if the patient has X mutations, we adjust for X tests). I have no idea what the "genes" you mention are because there can be multiple mutations in the same gene.
I mean patients are independent no?
I am not sure: say you have a cohort of 100 (independent) patients for which you try to classify mutations on gene TP53. If you run the tests simultaneously, you would have a probability of getting a significant result (e.g. LOH class) by chance of
and this would be as large as 99%.
Wouldn't this happen even if samplings from different patients are independent?
But your tool
- is not a tester for a single gene, rather is a tester for somatic mutations detected by a panel;
- it should run on data of a single-patient, not on data from multiple patients.
I don't really see why you claim 2.
Yea, I agree with Giulio on this one, I would correct by sample.
Which means we develop a tool that works only on a single-sample. Like mobster, or others.
Ok guys, many thanks for your help on this. I just pushed the new code with BH correction done per sample.