leekgroup/recount

Unable to access data set to reproduce analysis

Closed this issue · 1 comments

I have been trying to download the following GSE73721 dataset (a dataset that features on the recount website) but cannot using recount library.

Below is the list of relevant commands I ran in the session :

library(recount)
project_info <- abstract_search('GSE32465')
project_info
number_samples species
340 12 human

project_info <- abstract_search('GSE73721')
project_info
[1] number_samples species abstract project
<0 rows> (or 0-length row.names)

project_info <- abstract_search('SRP064454')
project_info
[1] number_samples species abstract project
<0 rows> (or 0-length row.names)

The data is there on recount website. I would be grateful for your help.
(below is the csv file from the recount website)

accession number of samples species abstract
SRP064454 41 human Astrocytes were purified from fetal and adult human brain tissue using an immunopanning method with the HepaCAM antibody. Samples were taken from otherwise 'healthy' pieces of tissue, unless otherwise specified. Overall design: 6 fetal astrocyte samples, 12 adult astrocyte samples, 8 GBM or sclerotic hippocampal samples, 4 whole human cortex samples, 4 adult mouse astrocyte samples, and 11 human samples of other purified CNS cell types

Thanks,

> dev.tools::sessionInfo()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
tz
date 2016-12-19

Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.4.1 2016-10-29 CRAN (R 3.3.1)
AnnotationDbi 1.34.4 2016-10-06 Bioconductor
assertthat 0.1 2013-12-06 CRAN (R 3.3.1)
Biobase * 2.32.0 2016-05-16 Bioconductor
BiocGenerics * 0.18.0 2016-05-16 Bioconductor
BiocParallel 1.6.6 2016-12-02 Bioconductor
biomaRt 2.28.0 2016-09-03 Bioconductor
Biostrings 2.40.2 2016-08-10 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
BSgenome 1.40.1 2016-12-02 Bioconductor
bumphunter 1.12.0 2016-05-16 Bioconductor
cluster 2.0.5 2016-10-08 CRAN (R 3.3.1)
codetools 0.2-15 2016-10-05 CRAN (R 3.3.1)
colorspace 1.3-1 2016-11-18 CRAN (R 3.3.1)
data.table 1.9.8 2016-11-25 CRAN (R 3.3.1)
DBI 0.5-1 2016-09-10 CRAN (R 3.3.1)
derfinder 1.8.0 2016-12-18 Bioconductor
derfinderHelper 1.6.3 2016-05-17 Bioconductor
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.1.0)
doRNG 1.6 2014-03-07 CRAN (R 3.3.0)
downloader 0.4 2015-07-09 CRAN (R 3.3.0)
foreach 1.4.3 2015-10-13 CRAN (R 3.3.0)
foreign 0.8-67 2016-09-13 CRAN (R 3.3.1)
Formula 1.2-1 2015-04-07 CRAN (R 3.3.0)
GenomeInfoDb * 1.8.7 2016-12-02 Bioconductor
GenomicAlignments 1.8.4 2016-12-02 Bioconductor
GenomicFeatures 1.24.5 2016-12-02 Bioconductor
GenomicFiles 1.8.0 2016-05-12 Bioconductor
GenomicRanges * 1.24.3 2016-12-02 Bioconductor
GEOquery 2.38.4 2016-05-17 Bioconductor
ggplot2 2.2.0 2016-11-11 CRAN (R 3.3.1)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.1.0)
Hmisc 4.0-0 2016-11-01 CRAN (R 3.3.1)
htmlTable 1.7 2016-10-19 CRAN (R 3.3.1)
htmltools 0.3.5 2016-03-21 CRAN (R 3.3.1)
httr 1.2.1 2016-07-03 CRAN (R 3.3.1)
IRanges * 2.6.1 2016-12-02 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.3.0)
jsonlite 1.1 2016-09-14 CRAN (R 3.3.1)
knitr 1.15.1 2016-11-22 CRAN (R 3.3.1)
lattice 0.20-34 2016-09-06 CRAN (R 3.3.1)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.1.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.3.0)
magrittr 1.5 2014-11-22 CRAN (R 3.1.0)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.3.1)
matrixStats 0.51.0 2016-10-09 CRAN (R 3.3.1)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.1.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.3.1)
pkgmaker 0.22 2014-05-14 CRAN (R 3.3.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.3.1)
qvalue 2.4.2 2016-05-17 Bioconductor
R6 2.2.0 2016-10-05 CRAN (R 3.3.1)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.1.0)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.3.1)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.1)
recount * 1.0.6 2016-12-18 Bioconductor
registry 0.3 2015-07-08 CRAN (R 3.3.0)
rentrez 1.0.4 2016-10-26 CRAN (R 3.3.1)
reshape2 1.4.2 2016-10-22 CRAN (R 3.3.1)
rngtools 1.2.4 2014-03-06 CRAN (R 3.3.0)
rpart 4.1-10 2015-06-29 CRAN (R 3.3.0)
Rsamtools 1.26.1 2016-12-18 Bioconductor
RSQLite 1.1 2016-11-27 CRAN (R 3.3.1)
rtracklayer 1.34.1 2016-12-18 Bioconductor
S4Vectors * 0.10.3 2016-09-27 Bioconductor
scales 0.4.1 2016-11-09 CRAN (R 3.3.1)
stringi 1.1.2 2016-10-01 CRAN (R 3.3.1)
stringr 1.1.0 2016-08-19 CRAN (R 3.3.1)
SummarizedExperiment * 1.2.3 2016-12-02 Bioconductor
survival 2.40-1 2016-10-30 CRAN (R 3.3.1)
tibble 1.2 2016-08-26 CRAN (R 3.3.1)
VariantAnnotation 1.18.7 2016-12-02 Bioconductor
withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
XML 3.98-1.5 2016-11-10 CRAN (R 3.3.1)
xtable 1.8-2 2016-02-05 CRAN (R 3.1.0)
XVector 0.12.1 2016-12-02 Bioconductor
zlibbioc 1.18.0 2016-05-16 Bioconductor

Hi,

The recount package is working properly. Basically, abstract_search() searches the abstracts for text words. The example searches a GEO identifier, GSE32465, because it's mentioned in the abstract for that particular study. This does not mean that all GEO identifiers can be used to search projects. In your particular case, you already know the project id, so you don't need to use abstract_search(). Simply use download_study() directly as shown below.

Best,
Leonardo

Clean code

suppressMessages(library('recount'))

## One abstract mentions GSE32465 in the text
dim(abstract_search('GSE32465'))

## No abstract mentions GSE73721 in the text
abstract_search('GSE73721')

## Since you already know the project number, you can use that directly
download_study('SRP064454')

## Then load the data
load(file.path('SRP064454', 'rse_gene.Rdata'))
rse_gene

options(width = 120)
devtools::session_info()

Evaluated code

> suppressMessages(library('recount'))
> 
> ## One abstract mentions GSE32465 in the text
> dim(abstract_search('GSE32465'))
[1] 1 4
> 
> ## No abstract mentions GSE73721 in the text
> abstract_search('GSE73721')
[1] number_samples species        abstract       project       
<0 rows> (or 0-length row.names)
> download_study('SRP064454')
2016-12-20 10:16:33 downloading file rse_gene.Rdata to SRP064454
trying URL 'http://duffel.rail.bio/recount/SRP064454/rse_gene.Rdata'
Content type 'application/octet-stream' length 3044718 bytes (2.9 MB)
==================================================
downloaded 2.9 MB

> load(file.path('SRP064454', 'rse_gene.Rdata'))
> rse_gene
class: RangedSummarizedExperiment 
dim: 23779 41 
metadata(0):
assays(1): counts
rownames(23779): 1 10 ... 9994 9997
rowData names(3): gene_id bp_length symbol
colnames(41): SRR2557127 SRR2557125 ... SRR2557085 SRR2557083
colData names(21): project sample ... title characteristics
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value                                             
 version  R Under development (unstable) (2016-10-26 r71594)
 system   x86_64, darwin13.4.0                              
 ui       AQUA                                              
 language (EN)                                              
 collate  en_US.UTF-8                                       
 tz       America/New_York                                  
 date     2016-12-20                                        

Packages ---------------------------------------------------------------------------------------------------------------
 package              * version  date       source        
 acepack                1.4.1    2016-10-29 CRAN (R 3.4.0)
 AnnotationDbi          1.37.0   2016-10-26 Bioconductor  
 assertthat             0.1      2013-12-06 CRAN (R 3.4.0)
 Biobase              * 2.35.0   2016-10-23 Bioconductor  
 BiocGenerics         * 0.21.1   2016-12-01 Bioconductor  
 BiocParallel           1.9.2    2016-11-18 Bioconductor  
 biomaRt                2.31.3   2016-12-01 Bioconductor  
 Biostrings             2.43.1   2016-11-17 Bioconductor  
 bitops                 1.0-6    2013-08-17 CRAN (R 3.4.0)
 BSgenome               1.43.1   2016-11-11 Bioconductor  
 bumphunter             1.15.0   2016-10-23 Bioconductor  
 cluster                2.0.5    2016-10-08 CRAN (R 3.4.0)
 codetools              0.2-15   2016-10-05 CRAN (R 3.4.0)
 colorspace             1.3-1    2016-11-18 CRAN (R 3.4.0)
 data.table             1.10.0   2016-12-03 CRAN (R 3.4.0)
 DBI                    0.5-1    2016-09-10 CRAN (R 3.4.0)
 derfinder              1.9.5    2016-11-30 Bioconductor  
 derfinderHelper        1.9.3    2016-11-29 Bioconductor  
 devtools               1.12.0   2016-06-24 CRAN (R 3.4.0)
 digest                 0.6.10   2016-08-02 CRAN (R 3.4.0)
 doRNG                  1.6      2014-03-07 CRAN (R 3.4.0)
 downloader             0.4      2015-07-09 CRAN (R 3.4.0)
 foreach                1.4.3    2015-10-13 CRAN (R 3.4.0)
 foreign                0.8-67   2016-09-13 CRAN (R 3.4.0)
 Formula                1.2-1    2015-04-07 CRAN (R 3.4.0)
 GenomeInfoDb         * 1.11.6   2016-11-17 Bioconductor  
 GenomicAlignments      1.11.4   2016-12-01 Bioconductor  
 GenomicFeatures        1.27.4   2016-12-01 Bioconductor  
 GenomicFiles           1.11.3   2016-11-29 Bioconductor  
 GenomicRanges        * 1.27.15  2016-12-04 Bioconductor  
 GEOquery               2.41.0   2016-10-25 Bioconductor  
 ggplot2                2.2.0    2016-11-11 CRAN (R 3.4.0)
 gridExtra              2.2.1    2016-02-29 CRAN (R 3.4.0)
 gtable                 0.2.0    2016-02-26 CRAN (R 3.4.0)
 Hmisc                  4.0-0    2016-11-01 CRAN (R 3.4.0)
 htmlTable              1.7      2016-10-19 CRAN (R 3.4.0)
 htmltools              0.3.5    2016-03-21 CRAN (R 3.4.0)
 httr                   1.2.1    2016-07-03 CRAN (R 3.4.0)
 IRanges              * 2.9.13   2016-12-01 Bioconductor  
 iterators              1.0.8    2015-10-13 CRAN (R 3.4.0)
 jsonlite               1.1      2016-09-14 CRAN (R 3.4.0)
 knitr                  1.15.1   2016-11-22 CRAN (R 3.4.0)
 lattice                0.20-34  2016-09-06 CRAN (R 3.4.0)
 latticeExtra           0.6-28   2016-02-09 CRAN (R 3.4.0)
 lazyeval               0.2.0    2016-06-12 CRAN (R 3.4.0)
 locfit                 1.5-9.1  2013-04-20 CRAN (R 3.4.0)
 magrittr               1.5      2014-11-22 CRAN (R 3.4.0)
 Matrix                 1.2-7.1  2016-09-01 CRAN (R 3.4.0)
 matrixStats            0.51.0   2016-10-09 CRAN (R 3.4.0)
 memoise                1.0.0    2016-01-29 CRAN (R 3.4.0)
 munsell                0.4.3    2016-02-13 CRAN (R 3.4.0)
 nnet                   7.3-12   2016-02-02 CRAN (R 3.4.0)
 pkgmaker               0.22     2014-05-14 CRAN (R 3.4.0)
 plyr                   1.8.4    2016-06-08 CRAN (R 3.4.0)
 qvalue                 2.7.0    2016-10-23 Bioconductor  
 R6                     2.2.0    2016-10-05 CRAN (R 3.4.0)
 RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.4.0)
 Rcpp                   0.12.8   2016-11-17 CRAN (R 3.4.0)
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.4.0)
 recount              * 1.1.7    2016-11-29 Bioconductor  
 registry               0.3      2015-07-08 CRAN (R 3.4.0)
 rentrez                1.0.4    2016-10-26 CRAN (R 3.4.0)
 reshape2               1.4.2    2016-10-22 CRAN (R 3.4.0)
 rngtools               1.2.4    2014-03-06 CRAN (R 3.4.0)
 rpart                  4.1-10   2015-06-29 CRAN (R 3.4.0)
 Rsamtools              1.27.5   2016-12-01 Bioconductor  
 RSQLite                1.1      2016-11-27 CRAN (R 3.4.0)
 rtracklayer            1.35.1   2016-10-29 Bioconductor  
 S4Vectors            * 0.13.5   2016-12-01 Bioconductor  
 scales                 0.4.1    2016-11-09 CRAN (R 3.4.0)
 stringi                1.1.2    2016-10-01 CRAN (R 3.4.0)
 stringr                1.1.0    2016-08-19 CRAN (R 3.4.0)
 SummarizedExperiment * 1.5.3    2016-11-11 Bioconductor  
 survival               2.40-1   2016-10-30 CRAN (R 3.4.0)
 tibble                 1.2      2016-08-26 CRAN (R 3.4.0)
 VariantAnnotation      1.21.10  2016-12-01 Bioconductor  
 withr                  1.0.2    2016-06-20 CRAN (R 3.4.0)
 XML                    3.98-1.5 2016-11-10 CRAN (R 3.4.0)
 xtable                 1.8-2    2016-02-05 CRAN (R 3.4.0)
 XVector                0.15.0   2016-10-23 Bioconductor  
 zlibbioc               1.21.0   2016-10-23 Bioconductor  
>