gagneurlab/OUTRIDER

The difference in sequencing volume and OUTRIDER

Opened this issue · 5 comments

zyyy97 commented

Hello,
Thank you for developing the package, it is very useful.
I have some questions. Firstly we have recently conducted transcriptome sequencing, and the difference of several G's of each measured data will lead to the fluctuation of readcount. For example, I have a sample that can measure 12G of original data, but the sequencing volume of another sample is only 8G. May I ask if I can use these two samples together? One last question, we have applied for a public database, similar to gtex database, can I integrate it with my data? There may be differences in the amount of data, as well as differences in sequencing platforms.
Do you have any good suggestions for me?
thank you
Happy

Hi. By G you mean GB?
You could try merging samples with different seq depth and then check their size factors to see how much they actually differ from each other.
Regarding merging with external data, it could work as long as it's the same tissue and genome build. Preferably, they should've been aligned using the same platform, but could be fine if not. We recommend you to check the correlation heatmaps before and after normalization. Be sure the samples were counted using the same gtf file and parameters.

zyyy97 commented

Thank you very much for your reply.
First of all, I want to say that G stands for GB. I'd like to give you an overview of my entire analysis process.
I used featurecounts to quantify all the samples, and then preprocessed the rawcounts file exported by featurecounts, which is examined by the OUTRIDER package for outliers.This is how I processed the data before.
library(tidyverse)
library(data.table)
counts <- read.delim("/Desktop/featurecount.txt", comment.char="#")
head(counts)
counts1 <-counts[,-c(2:6)]
row.names(counts1) <- counts1$Geneid
counts1 <-counts1[,-1]
ids <- data.frame(geneid=rownames(counts1),
median=apply(counts1,1,median))
g2s <- fread('
/Desktop/g2s_vm25_gencode.txt',header = F,data.table = F) #Load the files extracted from gencode's gtf file
colnames(g2s) <- c("geneid","symbol")
table(ids$geneid %in% g2s$geneid)
ids <- ids[ids$geneid %in% g2s$geneid,]
ids$symbol <- g2s[match(ids$geneid,g2s$geneid),2]
ids <- ids[order(ids$symbol,ids$median,decreasing = T),]
dim(ids); table(duplicated(ids$symbol))
ids <- ids[!duplicated(ids$symbol),]
counts1 <- counts1 [rownames(ids),]
rownames(counts1) <- ids[match(rownames(counts1),ids$geneid),"symbol"]

counts1 <- counts1 [rowMeans(counts1 )>1,]
ctsTable_2<-as.data.frame(counts1)
ods <- OutriderDataSet(countData=ctsTable_2)
ods <- filterExpression(ods, minCounts=TRUE, filterGenes=TRUE)
ods <- OUTRIDER(ods)
res <- results(ods)
I actually used A quick tour in the instructions, it looks like I made a mistake, I need to use that An OUTRIDER analysis in detail and try again, right? Because I didn't compute the fpkm file.
In addition, I have received your suggestion. I will think about it and run it again.
Thank you very much
Happy

zyyy97 commented

Hi,
I have some questions and Could I have your email to transmit some of my results to you ?
Thanks

Hi,
You do not need to compute the FPKM in a separate function. They already get computed and added to the ods object when running filterExpression.
Sure, it's
yepez at in.tum.de

zyyy97 commented

Hello,
I have sent you an email, I wonder if you have received it? If not, I will send you another one.
Thanks