Mutations Almost All Missing
Closed this issue · 1 comments
No matter the cancer type, the number of non-missing mutations is always equal to the number of ranges.
library(curatedTCGAData)
headNeck <- curatedTCGAData("HNSC", "Mutation", dry.run = FALSE, version = "2.0.1")
dim(assays(headNeck)[[1]])
51799 279
table(is.na(assays(headNeck)[[1]]))
FALSE TRUE
51799 14400122
melanoma <- curatedTCGAData("UVM", "Mutation", dry.run = FALSE, version = "2.0.1")
dim(assays(melanoma)[[1]])
2174 80
table(is.na(assays(melanoma)[[1]]))
FALSE TRUE
2174 171746
> assays(melanoma)[[1]][1:5, 1:5]
TCGA-RZ-AB0B-01A-11D-A39W-08 TCGA-V3-A9ZX-01A-11D-A39W-08 TCGA-V3-A9ZY-01A-11D-A39W-08 TCGA-V4-A9E5-01A-11D-A39W-08 TCGA-V4-A9E7-01A-11D-A39W-08
18:9550172:+ "PPP4R1" NA NA NA NA
13:79175838:+ "POU4F1" NA NA NA NA
6:38828378:+ "DNAH8" NA NA NA NA
19:55086935:+ "LILRA2" NA NA NA NA
1:11169412:+ "MTOR" NA NA NA NA
> sessionInfo()
R version 4.1.0 (2021-05-18)
It implies that each and every mutation only occurs in one sample, which is unlikely to be real.
Hi Dario,
That's not how we measure mutation frequency.
You are working with a RaggedExperiment
and I would encourage you to visit the vignette for more information.
https://bioconductor.org/packages/release/bioc/vignettes/RaggedExperiment/inst/doc/RaggedExperiment.html
See here for an example of how to find non-silent mutations:
https://github.com/Bioconductor/RaggedExperiment/blob/master/inst/scripts/assay-functions-Ex.R
If you have further questions, please create a support.bioconductor.org post.
Thank you!
Best,
Marcel