natsuhiko/rasqual

Multiple comparisons correction

arielmadr opened this issue · 5 comments

Dear Natsuhiko,

Thanks a lot for the work on Rasqual, it is a great tool. I am working with ChIP-seq data, and would like to compute FDR.

I saw in a recent paper ( Schwartzentruber 2017) that authors performed one permutation and then estimated FDR. Do you have a recommendation on how many permutations to run? Also, if I understand correctly, this FDR will be for each peak. So, for reporting the significance of a given variant the reported BH q-value from Rasqual is sufficient, and FDR is calculated only to get significant peaks, right?

I know that this is more a conceptual question so I appreciate a lot your help and time.

Best,
Ariel

Hi Ariel,

Thanks for using RASQUAL. As you mentioned, the permutation test is adjusting genome-wide multiple testing. I would recommend 2 or 3 permutation runs to make sure the number of significant QTL peaks does not wildly change. I paste the R code to get a FDR threshold with the Q-values:

# q1 : real lead Q-value vector for all peaks from RASQUAL
# q0 : permutated Q-value vector
# alpha : FDR threshold
# This function returns the P-value threshold corresponding to FDR=alpha.
getFDR <-
function(q1, q0, alpha=0.1, z=NULL, subset=NULL){
	if(is.null(z)){
		a=0
		for(itr in 1:10){
			a=getFDR(q1,q0,alpha,rev(a+0:100/100^itr),subset)
		}
		a
	}else{
		if(!is.null(subset)){
			q1=q1[subset]
			q0=q0[subset]
		}
		q1=q1[!is.na(q1)]
		q0=q0[!is.na(q0)]
		x=NULL;
		for(i in z){
			x=c(x,sum(q0<i)/length(q0)/(sum(q1<i)/length(q1)))
		};
		max(c(0,z[x<alpha]),na.rm=T)
	}
}
# Example usage:
flag_fdr10 = q1 < getFDR(q1, q0, 0.1) # True = significant QTLs
table(flag_fdr10)

Best regrads,
Natsuhiko

Thanks a lot for your clear and prompt response Natsuhiko!

Ariel

Dear Natsuhiko,

Just a small question, can the permutation-Qvalue vector be n times longer than the real-lead vector ( where n is the number of permutations)?

thanks a lot,
Ariel

Dear Ariel,

Ideally we need to perform the permutation test n times to obtain the SNP level adjusted P-value for each peak in 1/n resolution. However, it is impossible to perform the permutation test for the SNP level because RASQUAL is computationally very intensive. Therefore we use the standard BH method to obtain SNP level Q-value for each peak. Subsequently, we perform permutation test once (or handful number) for genome-wide multiple testing correction. The "-r" option just performs the permutation test once genome-wide to obtain one permuted Q-value for each peak.

Best regards,
Natsuhiko

Hello,
I have a question about permutation test. If my RASQUAL mapping was divided into separate chunk and chromosomes, can I apply genome-wide FDR for all the tests combined?
Also, since RASQUAL is run on each peak, I'm assuming the permutation is done within each peak. But when calculating FDR, only the sorted q-value and permuted q-value matters, so it seems that the permuted Q-value from each peak cannot be compared to the q-value from the same peak?

Thanks!!