natsuhiko/rasqual

Using -m and -l

Closed this issue · 4 comments

Dear Natsuhiko,

I would like to get a clarification on using -m and -l for RNA-Seq data.

I am primarily interested in allele-specific gene expression. So I am planning to use the SNPs with in exons with "--as-only" option so that I end up with genes showing allele specific expression. Is this right approach ?

In a hypothetical scenario, there could be 200 SNPs that span the genomic space of the gene ( exons + introns ) but only 10 SNPs might overlap the exons. In this case, should I say -m 10 -l 200 ?

Also, sometimes I get the "-nan" in column 12 (Squared correlation between prior and posterior genotypes (fSNPs). What does this mean ?

Thanks,
Goutham A

Thanks for responding.

Could you explain what dose “allele-specific gene expression” specifically mean? RASQUAL can be used to map eQTLs using the allele-specific signal but it is not able to estimate the allelic imbalance from expression data in general.

If I use "--as-only" option, I thought that the association test is done only using allelic counts ( alleleic ratios), and If I restrict my SNPs to fSNPs, I would end up with fSNPs that show allelic imbalance. Sorry if I am mistaken.

I think you can introduce a dummy rSNP with all heterozygous individuals (if you have multiple samples) in the VCF file and test whether there is an expression difference between two alleles at the rSNP linked to fSNPs in coding regions. Although, I'm not quite sure it works fine to (1) estimate the over-dispersion and (2) control the P-value distribution in the null hypothesis. I would recommend to internally test that RASQUAL can be used to analyse allele-specific differential expression in general.

Thanks, I will explore in that direction.