gagneurlab/OUTRIDER

OUTRIDER validation: known outlier not found

Closed this issue · 3 comments

Hello OUTRIDER developers!

Thanks for the great package - it was easy to install and run, and produces very nice pictures.

However, after running examples, I tried to find the expression outlier in a known case. It is a clear outlier in a biological/clinical sense of the word: one copy of the gene is damaged, the expression is 2X lower than the mean, and is lower than in all of the controls (see the picture).

expression_rank

However, it is not significant in OUTRIDER (and with p-value of 1 will never be?). Statistically naive approaches based on a limited set of genes relevant to the disease, like looking at low Z scores, or just t-test of patient's expression as a mean vs healthy cohort, with p-value correction by multiplying on the number of tests (genes) will find this outlier.

Is that possible in OUTRIDER without much struggle? Please give me some suggestions on how to tune the algorithm to find this outlier, and I will try to apply them.

Thanks!
Sergey

Dear @naumenko-sa,

thanks for the time you take to go through all the results of OUTRIDER and that you report your findings.

We use the Negative Binomial distribution which also takes into account the dispersion of the gene expression. In this case the dispersion is quiet high. And the case does not look on the first glance like an outlier. We are currently working on a robust version of our algorithm which we hope to release soon (ca 2 weeks). This may solve the problem.

In the meantime I would create the full result table res <- results(ods, all=TRUE) and take the top n hits per sample. This would resemble the naive way of looking at extreme Z scores. Also we correct per sample over all genes with Benjamini-Yekuteli, which is kind of conservative. So you could do your own correction by selecting your genes of interest and then correct for them.

An example code:

library(OUTRIDER)
ods <- makeExampleOutriderDataSet()
ods <- OUTRIDER(ods)
res <- results(ods, all=TRUE)

# Rank by P-value
res[order(pValue), p_rank:=1:.N, by=sampleID]
res[p_rank <= 10 & sampleID == 'sample19']

# Rank by Z score
res[order(abs(zScore), decreasing=TRUE), z_rank:=1:.N, by=sampleID]
res[z_rank <= 10 & sampleID == 'sample19']

# Correction over subset
subRes <- res[geneID %in% paste0('gene', sample(1:200, 10))]
subRes[,padjustSubset:=p.adjust(pValue, 'BY'), by=sampleID]
subRes[,plot(padjust, padjustSubset, log='xy')]
grid()
abline(0,1)

I will come back to you when we have the new release out. So you can check if the upcoming improvements will resolve this.

Hi Sergey,

we updated our Method. It will likely resolve your problem as well. It seems that we implemented a bad default for q ( the autoencoders hidden dimension).

Please try again. Your plot should from above should look smoother (no 'stairs')
And see my answer to issue #7.

Cheers,
Felix

Hi Felix!
Thanks for the update! I've updated OUTRIDER. Unfortunately, the main problem persists - I could not detect a known outlier, because it is not significant. Simple Z-score cutoff and prioritization works. However, I am still using OUTRIDER to explore data and plot pictures, so I will cite it. Hopefully, you'll get accepted soon! SN