converting Mutations components to maftools-friendly form?
Opened this issue · 4 comments
I am finding it challenging to convert the RaggedExperiment to a more MAF-like
tabular form. Am I missing something? Maybe we should add a component with
MAF content, perhaps as a dense GRanges, named "MAF"? I think this would
be used more readily, and we already have code that converts MAF to RaggedExperiment,
which could be provided as a tool.
Couple thoughts:
- A MAF-like form wouldn't be compatible with MultiAssayExperiment, so a coercion method to DataFrame probably would belong with the RaggedExperiment package.
- It is a pain to convert these TCGA RaggedExperiments to matrices or to RangedSummarizedExperiment with one row per gene, equivalently to the RNA-seq datasets. @vjcitn and @LiNk-NY would you try out the helper function in this gist, see if you find it useful? Currently it just converts the RaggedExperiments to genes x samples matrices, but I could easily have it convert to RangedSummarizedExperiment
https://gist.github.com/lwaldron/47fb0c0bece56f58b762192c24117231
The gist now converts the RaggedExperiments to RangedSummarizedExperiments, instead of matrices.
Back to my comment 1 - this coercion method could be useful for GRangesList as well as for RaggedExperiment, so it's not even just a RaggedExperiment question.
@vjcitn and @LiNk-NY take a look at the conveniencefuns branch I just pushed. It's far from perfect but does the following:
> accmae <- curatedTCGAData("ACC", c("CNASNP", "Mutation", "miRNASeqGene", "GISTICT"), dry.run = FALSE)
> accmae
A MultiAssayExperiment object of 4 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 4:
[1] ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 180 columns
[2] ACC_GISTIC_ThresholdedByGene-20160128: SummarizedExperiment with 24776 rows and 90 columns
[3] ACC_miRNASeqGene-20160128: SummarizedExperiment with 1046 rows and 80 columns
[4] ACC_Mutation-20160128: RaggedExperiment with 20166 rows and 90 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
> simplemae <- simplifyTCGA(accmae)
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
> simplemae
A MultiAssayExperiment object of 6 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 6:
[1] ACC_Mutation-20160128_simplified: RangedSummarizedExperiment with 22945 rows and 90 columns
[2] ACC_CNASNP-20160128_simplified: RangedSummarizedExperiment with 22945 rows and 180 columns
[3] ACC_miRNASeqGene-20160128_ranged: RangedSummarizedExperiment with 1002 rows and 80 columns
[4] ACC_miRNASeqGene-20160128_unranged: SummarizedExperiment with 44 rows and 80 columns
[5] ACC_GISTIC_ThresholdedByGene-20160128_ranged: RangedSummarizedExperiment with 19601 rows and 90 columns
[6] ACC_GISTIC_ThresholdedByGene-20160128_unranged: SummarizedExperiment with 5175 rows and 90 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
> rownames(simplemae)
CharacterList of length 6
[["ACC_Mutation-20160128_simplified"]] A1BG NAT2 ADA CDH2 AKT3 ... KCNE2 DGCR2 CASP8AP2 SCO2
[["ACC_CNASNP-20160128_simplified"]] A1BG NAT2 ADA CDH2 AKT3 ... KCNE2 DGCR2 CASP8AP2 SCO2
[["ACC_miRNASeqGene-20160128_ranged"]] hsa-let-7a-1 hsa-let-7a-2 ... hsa-mir-99a hsa-mir-99b
[["ACC_miRNASeqGene-20160128_unranged"]] hsa-mir-103-1 hsa-mir-103-1-as ... hsa-mir-941-4
[["ACC_GISTIC_ThresholdedByGene-20160128_ranged"]] ACAP3 ACTRT2 AGRN ... SNORA56 TMLHE VBP1
[["ACC_GISTIC_ThresholdedByGene-20160128_unranged"]] C1orf170 ... WASIR1|ENSG00000185203.7
>