natsuhiko/rasqual

2707 caQTL in RASQUAL paper

YaCui opened this issue · 12 comments

YaCui commented

Dear Natsuhiko,
Thanks so much for developing rasqual! Could you provide the 2707 caQTLs identified in RASQUAL paper?

best,
Ya

Hi,

Here is the link to the google drive: https://drive.google.com/open?id=0B-aFDIHv9Wy3M3kwS1hPM09TRlU

You can find the peak annotation (peaks.bed.gz) as well as the peak IDs at FDR 10% (pid.fdr10.txt).

I would, however, recommend to use the latest caQTL result with 100 British samples presented in our latest paper (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics).

Best regards,

Natsuhiko

YaCui commented

Great! Thanks for sharing!

best,
Ya

YaCui commented

Dear Natsuhiko,
I have a small question. How should I determine the values of -l and -m? Can I just use "-l 378 -m 62" in my analysis for all features?

Thanks,
Ya

You need to count appropriate numbers of SNPs for each feature by your self. It's relatively easy to count the number of tested SNPs (-l) by counting the number of rows in VCF that are fed to RASQUAL (you can just use wc command on linux). You could set the number of feature SNPs (-m) as the number of tested SNPs if you have enough memory and not sure how to count the number of SNPs overlapping with multiple features.

Best regards,
Natsuhiko

YaCui commented

Dear Natsuhiko,
I am a little confused about the results of Rasqual. I can get the results like "rasqual_atac_1M.gz", but how can I get the q-values in "Q.val.txt.gz"? It seems that q-values in "Q.val.txt.gz" are different from the "Log_10 Benjamini-Hochberg Q-value" in "rasqual_atac_1M.gz".

All files are from https://drive.google.com/drive/folders/0B-aFDIHv9Wy3M3kwS1hPM09TRlU.

Thanks,
Ya

Sorry for the confusion. The file "rasqual_atac_1M.gz" is old and the 10th column is not the Q value. This is because we provide the Q values as a separate file.

Best regards,
Natsuhiko

YaCui commented

Hi Natsuhiko,
So how can I get the Q values file? I cannot get this file if I just run the commands like below:

cd $RASQUALDIR
tabix data/chr11.gz 11:2315000-2340000 | bin/rasqual -y data/Y.bin -k data/K.bin -n 24 -j 1 -l 378 -m 62 -s 2316875,2320655,2321750,2321914,2324112 -e 2319151,2320937,2321843,2323290,2324279 -t -f C11orf21 -z

Thanks,
Ya

Sorry, but I don't understand your problem. I believe Q.val.txt.gz gives you the Q value for each peak in the rasqual_atac_1M.gz file.

The example command found in the github page is for RNA-seq, but not ATAC-seq we provided in the Google drive.

Best regards,
Natsuhiko

YaCui commented

Hi Natsuhiko,
Got it. Thank you so much for your help.

Thanks,
Ya

Hi Natsuhiko,

regarding the caQTL result with 100 British samples (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics), I have your summary statistics with the probabilities but I don't know what is the cutoff you use to define a caQTL and how many are there in total? I cannot find it in the paper. Thank you very much!!!!!
Paola

Hi Paola,

The RASQUAL mapping result based on 24 LCLs (not 100 LCLs) is found here: https://drive.google.com/drive/folders/0B-aFDIHv9Wy3M3kwS1hPM09TRlU

The paper you cited is different. In the paper, we used 100 LCLs and performed caQTL mapping with a different approach to detect causal interactions in the genome. Because we used a Bayesian approach, we don't have "significant caQTLs" but just posterior probabilities.

Best regards,
Natsuhiko

Thank you Natsuhiko!
Yes I have been using the results from the 24 LCLs of the first study, but since in your comment above you said:
"I would, however, recommend to use the latest caQTL result with 100 British samples presented in our latest paper (https://www.nature.com/articles/s41588-018-0278-6?WT.feed_name=subjects_epigenetics)", I though that you also identified caQTL, maybe more than using 24 samples so I though to use this new study.... Anyway I can just use the results from the 24 samples !
Thank you very much!!
Paola