how to get molecularData with out entrezGeneIds

Question

how to get molecularData with out entrezGeneIds

Closed this issue 2 years ago · 3 comments

limbo1996 commented 3 years ago

Hello
When I use molecularData(), I find entrezGeneIds is necessary. Like

test <- molecularData(cbio, 
                      molecularProfileId = "acc_tcga_rna_seq_v2_mrna",
                      entrezGeneIds = 1:1000, # a range of entrezGeneIds
                      sampleIds = c("TCGA-OR-A5J1-01",  "TCGA-OR-A5J2-01")
                      )

But this is only part of the data for these two samples.
How do I get all the data for these two samples in acc_tcga_rna_seq_v2_mrna when I don't need to enter the range of entrezGeneIds ?
Thanks a lot.

Answer 1 · 2022-05-12T15:07:15.000Z

Hi @limbo1996
The API was designed to take slices of the data, thus entreGeneIds or HugoSymbols are required.
If you'd like to get all the data, you can try the bulk method by doing:

acc <- cBioDataPack("acc_tcga")
acc
#' A MultiAssayExperiment object of 11 listed
#'  experiments with user-defined names and respective classes.
#'  Containing an ExperimentList class object of length 11:
#'  [1] cna_hg19.seg: RaggedExperiment with 16080 rows and 90 columns
#'  [2] CNA: SummarizedExperiment with 24776 rows and 90 columns
#'  [3] linear_CNA: SummarizedExperiment with 24776 rows and 90 columns
#'  [4] methylation_hm450: SummarizedExperiment with 15755 rows and 80 columns
#'  [5] mutations_extended: RaggedExperiment with 20166 rows and 90 columns
#'  [6] mutations_mskcc: RaggedExperiment with 20166 rows and 90 columns
#'  [7] RNA_Seq_v2_expression_median: SummarizedExperiment with 20531 rows and 79 columns
#'  [8] RNA_Seq_v2_mRNA_median_all_sample_Zscores: SummarizedExperiment with 20531 rows and 79 columns
#'  [9] RNA_Seq_v2_mRNA_median_Zscores: SummarizedExperiment with 20440 rows and 79 columns
#'  [10] rppa_Zscores: SummarizedExperiment with 191 rows and 46 columns
#'  [11] rppa: SummarizedExperiment with 192 rows and 46 columns
#' Functionality:
#'  experiments() - obtain the ExperimentList instance
#'  colData() - the primary/phenotype DataFrame
#'  sampleMap() - the sample coordination DataFrame
#'  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#'  *Format() - convert into a long or wide DataFrame
#'  assays() - convert ExperimentList to a SimpleList of matrices
#'  exportClass() - save data to flat files

And then filtering out by sampleId

Answer 2 · 2022-05-13T04:01:05.000Z

@LiNk-NY Thanks for your reply!
But when I use cBioDataPack, return:

Warning messages:
1: Unable to import: mrna_seq_v2_rsem
Reason: missing value where TRUE/FALSE needed 
2: Unable to import: mrna_seq_v2_rsem_zscores_ref_all_samples
Reason: missing value where TRUE/FALSE needed 
3: In .find_with_xfix(df_colnames, get(paste0(fix, 1)), get(paste0(fix,  :
   Multiple prefixes found, using keyword 'region' or taking first one
4: In .find_with_xfix(df_colnames, get(paste0(fix, 1)), get(paste0(fix,  :
   Multiple prefixes found, using keyword 'region' or taking first one

So what can I do to import all mrna files?
Thanks again!

Answer 3 · 2022-05-13T15:36:06.000Z

Hi @limbo1996

Thanks for pointing this out. It seems to be an issue with missing rownames in the data and the way the SummarizedExperiment constructor function handles name checks. If interested, you can follow the issue here:

Bioconductor/SummarizedExperiment#64

There is an issue on the curation side AFAICT. When reading the data manually, there are NA in the Hugo_Symbol column.
You can use downloadStudy and then untarStudy to inspect the contents of the tarball.

Best,
Marcel