Quick question about number of samples and computeGeneLengths
Closed this issue · 2 comments
I am working on RNAseq data from a small cohort of rare but heterogeneous disease (12 samples). Although I am able to see clear outliers in edgeR/glimma expression plots, OUTRIDER seems to miss these. I am wondering if that infact is due to small number of samples. How do I set expression filters to may be tease out this differences? Also for computeGeneLengths, what are your recommendations for counts processed using UCSC gtf and annotation.
Thanks!
Hi AmrR101:
-
it could indeed be your data set size. I would recommend you to find comparable samples, with similar RNA sequencing but with a different disease or even healthy (gtex).
-
second you could try to set the encoding dimension to a smaller value: We are currently implementing a rule that fits well for data sets with many samples. You could try q = 2.
Or runfindEncodingDim(ods, params = 2:5)
and then use the best value. -
maybe this can rescue it but potentially your dataset is just to small for OUTRIDER to work.