NA's in rse_tx_brain
Closed this issue · 3 comments
Hi,
I downloaded the rse_tx_brain from https://jhubiostatistics.shinyapps.io/recount/ for both TCGA and GTEX. I only renamed them as "rse_tx_brain_TCGA.RData" and "rse_tx_brain_GTEX.RData" ( otherwise both objects would be called "rse_tx_brain.RData"
When I load both the objects, I notice a lot of NA's for a lot of transcripts - I was wondering why that is ? The NA's most likely make getRPKM() fail too.
> rm(list=ls())
> library(SummarizedExperiment)
> setwd("~/HollandLabShared/Sonali/recount_brain")
> gtex = get(load("rse_tx_brain_GTEX.Rdata"))
> tcga = get(load("rse_tx_brain_TCGA.RData"))
>
> tcga_rpkm = getRPKM(tcga)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
> gtex_rpkm = getRPKM(gtex)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
I was under the impression that maybe the same transcript is NA for all samples - but that is not the case.
> length(which(is.na(assay(gtex[,1]))))
[1] 522
> length(which(is.na(assay(gtex[,151]))))
[1] 436
>
> length(which(is.na(assay(gtex[,20]))))
[1] 892
> length(which(is.na(assay(gtex[,251]))))
[1] 436
> summary(as.numeric(assay(tcga[grep("ENST00000304053.10", rownames(tcga)), ])))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0 0 0 0 0 0 543
Kindly advise.
Thanks,
Sonali.
Hi Sonali!
Thanks for the question!
Ahh, yeah, I see how the names can clash. We recommend saving each object inside a directory for the given project, so SRP012682/rse_tx_brain.RData
for GTEx and TCGA/rse_tx_brain.RData
for TCGA. I should try to resolve this along with #15 and #16! Sorry for the delay on those!
As for the NAs on the rse_tx
files, Jack Fu @JMF47, the lead author behind the transcript files should hopefully be able to respond after the long weekend break.
Best,
Leo
hi @sonali-bioc, one of the factors that determines the number of NA transcripts reported is the read-length of sequencing. The read-length affects how well we can deconvolve exon+junction counts into transcript-level counts. Could you check if for instance samples gtex[,151]
and gtex[,251]
have the same read lengths?
I'm closing this issue since it seems inactive.