qglobal_cv and qallsubs_cv = 1
antu2817 opened this issue · 2 comments
Hi,
I executed the dndscv with around 300 mutations from mouse exomes (including 50 indels), both qglobal_cv and qallsubs_cv were found to be equal to 1 and hence I used pglobal_cv and pallsubs_cv to extract significant genes using the following commands
sel_cv[sel_cv$pglobal_cv<0.1, c("gene_name","pglobal_cv")] = 56 genes obtained
sel_cv[sel_cv$pallsubs_cv<0.1, c("gene_name","pallsubs_cv")] = 60 genes obtained
I would like to know if I can use pglobal_cv or pallsubs_cv to extract significant genes when I get zero hits with qglobal_cv and qallsubs_cv.
I am attaching global dn/ds estimates and theta values for your reference.
Any help would be highly appreciated!!
Thanks
Ananth
Hi Ananth,
Thank you for your message and your interest in dNdScv.
You should only use the q-values I am afraid. The q-values are the p-values adjusted for multiple testing. Under ideal conditions, a p-value<0.05 is expected to occur by chance under neutrality (null hypothesis) in 5% of genes. So, when testing a large number of genes, many may show significant p-values in the absence of selection. Adjusting the p-values for multiple testing (q-values) protects you against these false positives. So, for the purpose of driver discovery, only q-values should be used.
If power is limiting, as it seems to be the case in your dataset, you could consider restricted hypothesis testing, by performing p-value adjustment on an a priori list of known cancer genes. However, this should be done before analysing the data, to avoid the temptation of including suggestively mutated genes in the list a posteriori.
Best,
Inigo
Hi Inigo,
Thank you for the prompt response, suggestion and explanation, I also agree that adjusted p-values are always good but I was curious if we can consider other p-values as well. I increased my dataset with 20000 mutations from 30 samples and this time I get 17 genes with qglobal_cv<0.1 but theta value is less which I understand may reflect some problem with the suitability of dNdScv model. Please let me know if I can still consider these 17 genes as significant hits from this dataset.