column names of colData of RangedSummarizedExperiment inconsistent?
Closed this issue · 3 comments
Hi, I noticed that the names of the columns in the row data table for a RangedSummarizedExperiment object seem to be inconsistent with the data in the columns of the table, unless I am misunderstanding something (I am an R novice).
I download a RangedSummarizedExperiment as follows:
url <- download_study('SRP009615')
load(file.path('SRP009615', 'rse_gene.Rdata'))
rowData(rse_gene)
I get a DataFrame with 21 columns. The order of the names of the columns does not seem to coincide with the data in that column. For example, the first column name is "project"; however, the first column seems to contain the run accession. Is this a bug? Or is there another way I am supposed to find the name of each column?
Thanks!
Hi @mbernste,
Since you mentioned that you are a new R user, you might want to check the vignette for the SummarizedExperiment package.
From your code, I believe that you meant to check colData()
(information about the samples) instead of rowData()
(information about the genes). The row names of the column data correspond to the SRA run identifier, not the SRA project identifier. The Sequence Read Archive (SRA) has multiple identifiers and the one that specifies a given sample is the run
one.
In the future, I encourage you to use the Bioconductor support website https://support.bioconductor.org/ that has higher visibility, since other people might have the same questions you have. Remember to use tags!
Best,
Leonardo
Un-evaluated code
library('recount')
library('devtools')
## Code from mbernste
url <- download_study('SRP009615')
load(file.path('SRP009615', 'rse_gene.Rdata'))
rowData(rse_gene)
## Explore the column data, not the row one
dim(colData(rse_gene))
colData(rse_gene)[, 1:4]
identical(rownames(colData(rse_gene)), colData(rse_gene)$run)
## Reproducibility info
proc.time()
message(Sys.time())
options(width = 120)
session_info()
Evaluated code
> library('recount')
> library('devtools')
>
> ## Code from mbernste
> url <- download_study('SRP009615')
2017-05-05 12:09:09 downloading file rse_gene.Rdata to SRP009615
trying URL 'http://duffel.rail.bio/recount/SRP009615/rse_gene.Rdata'
Content type 'application/octet-stream' length 3120155 bytes (3.0 MB)
==================================================
downloaded 3.0 MB
> load(file.path('SRP009615', 'rse_gene.Rdata'))
> rowData(rse_gene)
DataFrame with 58037 rows and 3 columns
gene_id bp_length symbol
<character> <integer> <CharacterList>
1 ENSG00000000003.14 4535 TSPAN6
2 ENSG00000000005.5 1610 TNMD
3 ENSG00000000419.12 1207 DPM1
4 ENSG00000000457.13 6883 SCYL3
5 ENSG00000000460.16 5967 C1orf112
... ... ... ...
58033 ENSG00000283695.1 61 NA
58034 ENSG00000283696.1 997 NA
58035 ENSG00000283697.1 1184 LOC101928917
58036 ENSG00000283698.1 940 NA
58037 ENSG00000283699.1 60 MIR4481
>
> ## Explore the column data, not the row one
> dim(colData(rse_gene))
[1] 12 21
> colData(rse_gene)[, 1:4]
DataFrame with 12 rows and 4 columns
project sample experiment run
<character> <character> <character> <character>
SRR387777 SRP009615 SRS281685 SRX110461 SRR387777
SRR387778 SRP009615 SRS281686 SRX110462 SRR387778
SRR387779 SRP009615 SRS281687 SRX110463 SRR387779
SRR387780 SRP009615 SRS281688 SRX110464 SRR387780
SRR389077 SRP009615 SRS282369 SRX111299 SRR389077
... ... ... ... ...
SRR389080 SRP009615 SRS282372 SRX111302 SRR389080
SRR389081 SRP009615 SRS282373 SRX111303 SRR389081
SRR389082 SRP009615 SRS282374 SRX111304 SRR389082
SRR389083 SRP009615 SRS282375 SRX111305 SRR389083
SRR389084 SRP009615 SRS282376 SRX111306 SRR389084
> identical(rownames(colData(rse_gene)), colData(rse_gene)$run)
[1] TRUE
>
> ## Reproducibility info
> proc.time()
user system elapsed
14.981 2.365 162.647
> message(Sys.time())
2017-05-05 12:09:10
> options(width = 120)
> session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.4.0 (2017-04-21)
system x86_64, darwin15.6.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2017-05-05
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.4.1 2016-10-29 CRAN (R 3.4.0)
AnnotationDbi 1.38.0 2017-04-25 Bioconductor
backports 1.0.5 2017-01-18 CRAN (R 3.4.0)
base64enc 0.1-3 2015-07-28 CRAN (R 3.4.0)
Biobase * 2.36.2 2017-05-04 Bioconductor
BiocGenerics * 0.22.0 2017-04-25 Bioconductor
BiocParallel 1.10.1 2017-05-03 Bioconductor
biomaRt 2.32.0 2017-04-26 Bioconductor
Biostrings 2.44.0 2017-04-25 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.4.0)
BSgenome 1.44.0 2017-04-25 Bioconductor
bumphunter 1.16.0 2017-04-25 Bioconductor
checkmate 1.8.2 2016-11-02 CRAN (R 3.4.0)
cluster 2.0.6 2017-03-10 CRAN (R 3.4.0)
codetools 0.2-15 2016-10-05 CRAN (R 3.4.0)
colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0)
data.table 1.10.4 2017-02-01 CRAN (R 3.4.0)
DBI 0.6-1 2017-04-01 CRAN (R 3.4.0)
DelayedArray * 0.2.0 2017-04-25 Bioconductor
derfinder 1.10.0 2017-04-25 Bioconductor
derfinderHelper 1.10.0 2017-04-25 Bioconductor
devtools * 1.12.0 2016-12-05 CRAN (R 3.4.0)
digest 0.6.12 2017-01-27 CRAN (R 3.4.0)
doRNG 1.6.6 2017-04-10 CRAN (R 3.4.0)
downloader 0.4 2015-07-09 CRAN (R 3.4.0)
foreach 1.4.3 2015-10-13 CRAN (R 3.4.0)
foreign 0.8-68 2017-04-24 CRAN (R 3.4.0)
Formula 1.2-1 2015-04-07 CRAN (R 3.4.0)
GenomeInfoDb * 1.12.0 2017-04-25 Bioconductor
GenomeInfoDbData 0.99.0 2017-02-14 Bioconductor
GenomicAlignments 1.12.0 2017-04-25 Bioconductor
GenomicFeatures 1.28.0 2017-04-26 Bioconductor
GenomicFiles 1.12.0 2017-04-26 Bioconductor
GenomicRanges * 1.28.1 2017-05-03 Bioconductor
GEOquery 2.42.0 2017-04-25 Bioconductor
ggplot2 2.2.1 2016-12-30 CRAN (R 3.4.0)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.4.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
Hmisc 4.0-3 2017-05-02 CRAN (R 3.4.0)
htmlTable 1.9 2017-01-26 CRAN (R 3.4.0)
htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
htmlwidgets 0.8 2016-11-09 CRAN (R 3.4.0)
httr 1.2.1 2016-07-03 CRAN (R 3.4.0)
IRanges * 2.10.0 2017-04-25 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.4.0)
jsonlite 1.4 2017-04-08 CRAN (R 3.4.0)
knitr 1.15.1 2016-11-22 CRAN (R 3.4.0)
lattice 0.20-35 2017-03-25 CRAN (R 3.4.0)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.4.0)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.0)
magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
Matrix 1.2-10 2017-04-28 CRAN (R 3.4.0)
matrixStats * 0.52.2 2017-04-14 CRAN (R 3.4.0)
memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.4.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.4.0)
pkgmaker 0.22 2014-05-14 CRAN (R 3.4.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
qvalue 2.8.0 2017-04-25 Bioconductor
R6 2.2.0 2016-10-05 CRAN (R 3.4.0)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.4.0)
Rcpp 0.12.10 2017-03-19 CRAN (R 3.4.0)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.4.0)
recount * 1.2.0 2017-04-25 Bioconductor
registry 0.3 2015-07-08 CRAN (R 3.4.0)
rentrez 1.0.4 2016-10-26 CRAN (R 3.4.0)
reshape2 1.4.2 2016-10-22 CRAN (R 3.4.0)
rngtools 1.2.4 2014-03-06 CRAN (R 3.4.0)
rpart 4.1-11 2017-03-13 CRAN (R 3.4.0)
Rsamtools 1.28.0 2017-04-25 Bioconductor
RSQLite 1.1-2 2017-01-08 CRAN (R 3.4.0)
rtracklayer 1.36.0 2017-04-25 Bioconductor
S4Vectors * 0.14.0 2017-04-25 Bioconductor
scales 0.4.1 2016-11-09 CRAN (R 3.4.0)
stringi 1.1.5 2017-04-07 CRAN (R 3.4.0)
stringr 1.2.0 2017-02-18 CRAN (R 3.4.0)
SummarizedExperiment * 1.6.1 2017-05-03 Bioconductor
survival 2.41-3 2017-04-04 CRAN (R 3.4.0)
tibble 1.3.0 2017-04-01 CRAN (R 3.4.0)
VariantAnnotation 1.22.0 2017-04-25 Bioconductor
withr 1.0.2 2016-06-20 CRAN (R 3.4.0)
XML 3.98-1.7 2017-05-03 CRAN (R 3.4.0)
xtable 1.8-2 2016-02-05 CRAN (R 3.4.0)
XVector 0.16.0 2017-04-25 Bioconductor
zlibbioc 1.22.0 2017-04-25 Bioconductor
>
Hi, thanks for your fast response. In the future I will post to the BioConductor forum for questions like this. I did mean to say colData
, not rowData
in my question; I apologize for the confusion and updated the title of the issue.
No problem and have a good day ^^