Quick question about number of samples and computeGeneLengths

Question

Quick question about number of samples and computeGeneLengths

Closed this issue 2 years ago · 2 comments

I am working on RNAseq data from a small cohort of rare but heterogeneous disease (12 samples). Although I am able to see clear outliers in edgeR/glimma expression plots, OUTRIDER seems to miss these. I am wondering if that infact is due to small number of samples. How do I set expression filters to may be tease out this differences? Also for computeGeneLengths, what are your recommendations for counts processed using UCSC gtf and annotation.

Thanks!

Answer 1 · 2019-11-29T08:37:58.000Z

Hi AmrR101:

it could indeed be your data set size. I would recommend you to find comparable samples, with similar RNA sequencing but with a different disease or even healthy (gtex).
second you could try to set the encoding dimension to a smaller value: We are currently implementing a rule that fits well for data sets with many samples. You could try q = 2.
Or run findEncodingDim(ods, params = 2:5) and then use the best value.
maybe this can rescue it but potentially your dataset is just to small for OUTRIDER to work.

Answer 2 · 2023-01-03T10:12:07.000Z

As this issue did not get any updates for a while I will close it. @AmrR101, if you have any further questions please reopen this issue or open a new one.