im3sanger/dndscv

qglobal_cv and qallsubs_cv = 1

antu2817 opened this issue · 2 comments

Hi,

I executed the dndscv with around 300 mutations from mouse exomes (including 50 indels), both qglobal_cv and qallsubs_cv were found to be equal to 1 and hence I used pglobal_cv and pallsubs_cv to extract significant genes using the following commands

sel_cv[sel_cv$pglobal_cv<0.1, c("gene_name","pglobal_cv")] = 56 genes obtained
sel_cv[sel_cv$pallsubs_cv<0.1, c("gene_name","pallsubs_cv")] = 60 genes obtained

I would like to know if I can use pglobal_cv or pallsubs_cv to extract significant genes when I get zero hits with qglobal_cv and qallsubs_cv.

I am attaching global dn/ds estimates and theta values for your reference.

Screenshot from 2020-08-21 10-21-24
Screenshot from 2020-08-21 10-06-47

Any help would be highly appreciated!!

Thanks
Ananth

Hi Ananth,

Thank you for your message and your interest in dNdScv.

You should only use the q-values I am afraid. The q-values are the p-values adjusted for multiple testing. Under ideal conditions, a p-value<0.05 is expected to occur by chance under neutrality (null hypothesis) in 5% of genes. So, when testing a large number of genes, many may show significant p-values in the absence of selection. Adjusting the p-values for multiple testing (q-values) protects you against these false positives. So, for the purpose of driver discovery, only q-values should be used.

If power is limiting, as it seems to be the case in your dataset, you could consider restricted hypothesis testing, by performing p-value adjustment on an a priori list of known cancer genes. However, this should be done before analysing the data, to avoid the temptation of including suggestively mutated genes in the list a posteriori.

Best,
Inigo

Hi Inigo,
Thank you for the prompt response, suggestion and explanation, I also agree that adjusted p-values are always good but I was curious if we can consider other p-values as well. I increased my dataset with 20000 mutations from 30 samples and this time I get 17 genes with qglobal_cv<0.1 but theta value is less which I understand may reflect some problem with the suitability of dNdScv model. Please let me know if I can still consider these 17 genes as significant hits from this dataset.

Best
Ananth
Screenshot from 2020-08-21 11-15-00