Unable to access data set to reproduce analysis
Closed this issue · 1 comments
I have been trying to download the following GSE73721 dataset (a dataset that features on the recount website) but cannot using recount library.
Below is the list of relevant commands I ran in the session :
library(recount)
project_info <- abstract_search('GSE32465')
project_info
number_samples species
340 12 human
project_info <- abstract_search('GSE73721')
project_info
[1] number_samples species abstract project
<0 rows> (or 0-length row.names)
project_info <- abstract_search('SRP064454')
project_info
[1] number_samples species abstract project
<0 rows> (or 0-length row.names)
The data is there on recount website. I would be grateful for your help.
(below is the csv file from the recount website)
accession number of samples species abstract
SRP064454 41 human Astrocytes were purified from fetal and adult human brain tissue using an immunopanning method with the HepaCAM antibody. Samples were taken from otherwise 'healthy' pieces of tissue, unless otherwise specified. Overall design: 6 fetal astrocyte samples, 12 adult astrocyte samples, 8 GBM or sclerotic hippocampal samples, 4 whole human cortex samples, 4 adult mouse astrocyte samples, and 11 human samples of other purified CNS cell types
Thanks,
> dev.tools::sessionInfo()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
tz
date 2016-12-19
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.4.1 2016-10-29 CRAN (R 3.3.1)
AnnotationDbi 1.34.4 2016-10-06 Bioconductor
assertthat 0.1 2013-12-06 CRAN (R 3.3.1)
Biobase * 2.32.0 2016-05-16 Bioconductor
BiocGenerics * 0.18.0 2016-05-16 Bioconductor
BiocParallel 1.6.6 2016-12-02 Bioconductor
biomaRt 2.28.0 2016-09-03 Bioconductor
Biostrings 2.40.2 2016-08-10 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
BSgenome 1.40.1 2016-12-02 Bioconductor
bumphunter 1.12.0 2016-05-16 Bioconductor
cluster 2.0.5 2016-10-08 CRAN (R 3.3.1)
codetools 0.2-15 2016-10-05 CRAN (R 3.3.1)
colorspace 1.3-1 2016-11-18 CRAN (R 3.3.1)
data.table 1.9.8 2016-11-25 CRAN (R 3.3.1)
DBI 0.5-1 2016-09-10 CRAN (R 3.3.1)
derfinder 1.8.0 2016-12-18 Bioconductor
derfinderHelper 1.6.3 2016-05-17 Bioconductor
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.1.0)
doRNG 1.6 2014-03-07 CRAN (R 3.3.0)
downloader 0.4 2015-07-09 CRAN (R 3.3.0)
foreach 1.4.3 2015-10-13 CRAN (R 3.3.0)
foreign 0.8-67 2016-09-13 CRAN (R 3.3.1)
Formula 1.2-1 2015-04-07 CRAN (R 3.3.0)
GenomeInfoDb * 1.8.7 2016-12-02 Bioconductor
GenomicAlignments 1.8.4 2016-12-02 Bioconductor
GenomicFeatures 1.24.5 2016-12-02 Bioconductor
GenomicFiles 1.8.0 2016-05-12 Bioconductor
GenomicRanges * 1.24.3 2016-12-02 Bioconductor
GEOquery 2.38.4 2016-05-17 Bioconductor
ggplot2 2.2.0 2016-11-11 CRAN (R 3.3.1)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.1.0)
Hmisc 4.0-0 2016-11-01 CRAN (R 3.3.1)
htmlTable 1.7 2016-10-19 CRAN (R 3.3.1)
htmltools 0.3.5 2016-03-21 CRAN (R 3.3.1)
httr 1.2.1 2016-07-03 CRAN (R 3.3.1)
IRanges * 2.6.1 2016-12-02 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.3.0)
jsonlite 1.1 2016-09-14 CRAN (R 3.3.1)
knitr 1.15.1 2016-11-22 CRAN (R 3.3.1)
lattice 0.20-34 2016-09-06 CRAN (R 3.3.1)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.1.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.3.0)
magrittr 1.5 2014-11-22 CRAN (R 3.1.0)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.3.1)
matrixStats 0.51.0 2016-10-09 CRAN (R 3.3.1)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.1.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.3.1)
pkgmaker 0.22 2014-05-14 CRAN (R 3.3.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.3.1)
qvalue 2.4.2 2016-05-17 Bioconductor
R6 2.2.0 2016-10-05 CRAN (R 3.3.1)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.1.0)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.3.1)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.1)
recount * 1.0.6 2016-12-18 Bioconductor
registry 0.3 2015-07-08 CRAN (R 3.3.0)
rentrez 1.0.4 2016-10-26 CRAN (R 3.3.1)
reshape2 1.4.2 2016-10-22 CRAN (R 3.3.1)
rngtools 1.2.4 2014-03-06 CRAN (R 3.3.0)
rpart 4.1-10 2015-06-29 CRAN (R 3.3.0)
Rsamtools 1.26.1 2016-12-18 Bioconductor
RSQLite 1.1 2016-11-27 CRAN (R 3.3.1)
rtracklayer 1.34.1 2016-12-18 Bioconductor
S4Vectors * 0.10.3 2016-09-27 Bioconductor
scales 0.4.1 2016-11-09 CRAN (R 3.3.1)
stringi 1.1.2 2016-10-01 CRAN (R 3.3.1)
stringr 1.1.0 2016-08-19 CRAN (R 3.3.1)
SummarizedExperiment * 1.2.3 2016-12-02 Bioconductor
survival 2.40-1 2016-10-30 CRAN (R 3.3.1)
tibble 1.2 2016-08-26 CRAN (R 3.3.1)
VariantAnnotation 1.18.7 2016-12-02 Bioconductor
withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
XML 3.98-1.5 2016-11-10 CRAN (R 3.3.1)
xtable 1.8-2 2016-02-05 CRAN (R 3.1.0)
XVector 0.12.1 2016-12-02 Bioconductor
zlibbioc 1.18.0 2016-05-16 Bioconductor
Hi,
The recount
package is working properly. Basically, abstract_search()
searches the abstracts for text words. The example searches a GEO identifier, GSE32465, because it's mentioned in the abstract for that particular study. This does not mean that all GEO identifiers can be used to search projects. In your particular case, you already know the project id, so you don't need to use abstract_search()
. Simply use download_study()
directly as shown below.
Best,
Leonardo
Clean code
suppressMessages(library('recount'))
## One abstract mentions GSE32465 in the text
dim(abstract_search('GSE32465'))
## No abstract mentions GSE73721 in the text
abstract_search('GSE73721')
## Since you already know the project number, you can use that directly
download_study('SRP064454')
## Then load the data
load(file.path('SRP064454', 'rse_gene.Rdata'))
rse_gene
options(width = 120)
devtools::session_info()
Evaluated code
> suppressMessages(library('recount'))
>
> ## One abstract mentions GSE32465 in the text
> dim(abstract_search('GSE32465'))
[1] 1 4
>
> ## No abstract mentions GSE73721 in the text
> abstract_search('GSE73721')
[1] number_samples species abstract project
<0 rows> (or 0-length row.names)
> download_study('SRP064454')
2016-12-20 10:16:33 downloading file rse_gene.Rdata to SRP064454
trying URL 'http://duffel.rail.bio/recount/SRP064454/rse_gene.Rdata'
Content type 'application/octet-stream' length 3044718 bytes (2.9 MB)
==================================================
downloaded 2.9 MB
> load(file.path('SRP064454', 'rse_gene.Rdata'))
> rse_gene
class: RangedSummarizedExperiment
dim: 23779 41
metadata(0):
assays(1): counts
rownames(23779): 1 10 ... 9994 9997
rowData names(3): gene_id bp_length symbol
colnames(41): SRR2557127 SRR2557125 ... SRR2557085 SRR2557083
colData names(21): project sample ... title characteristics
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R Under development (unstable) (2016-10-26 r71594)
system x86_64, darwin13.4.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2016-12-20
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.4.1 2016-10-29 CRAN (R 3.4.0)
AnnotationDbi 1.37.0 2016-10-26 Bioconductor
assertthat 0.1 2013-12-06 CRAN (R 3.4.0)
Biobase * 2.35.0 2016-10-23 Bioconductor
BiocGenerics * 0.21.1 2016-12-01 Bioconductor
BiocParallel 1.9.2 2016-11-18 Bioconductor
biomaRt 2.31.3 2016-12-01 Bioconductor
Biostrings 2.43.1 2016-11-17 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.4.0)
BSgenome 1.43.1 2016-11-11 Bioconductor
bumphunter 1.15.0 2016-10-23 Bioconductor
cluster 2.0.5 2016-10-08 CRAN (R 3.4.0)
codetools 0.2-15 2016-10-05 CRAN (R 3.4.0)
colorspace 1.3-1 2016-11-18 CRAN (R 3.4.0)
data.table 1.10.0 2016-12-03 CRAN (R 3.4.0)
DBI 0.5-1 2016-09-10 CRAN (R 3.4.0)
derfinder 1.9.5 2016-11-30 Bioconductor
derfinderHelper 1.9.3 2016-11-29 Bioconductor
devtools 1.12.0 2016-06-24 CRAN (R 3.4.0)
digest 0.6.10 2016-08-02 CRAN (R 3.4.0)
doRNG 1.6 2014-03-07 CRAN (R 3.4.0)
downloader 0.4 2015-07-09 CRAN (R 3.4.0)
foreach 1.4.3 2015-10-13 CRAN (R 3.4.0)
foreign 0.8-67 2016-09-13 CRAN (R 3.4.0)
Formula 1.2-1 2015-04-07 CRAN (R 3.4.0)
GenomeInfoDb * 1.11.6 2016-11-17 Bioconductor
GenomicAlignments 1.11.4 2016-12-01 Bioconductor
GenomicFeatures 1.27.4 2016-12-01 Bioconductor
GenomicFiles 1.11.3 2016-11-29 Bioconductor
GenomicRanges * 1.27.15 2016-12-04 Bioconductor
GEOquery 2.41.0 2016-10-25 Bioconductor
ggplot2 2.2.0 2016-11-11 CRAN (R 3.4.0)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.4.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
Hmisc 4.0-0 2016-11-01 CRAN (R 3.4.0)
htmlTable 1.7 2016-10-19 CRAN (R 3.4.0)
htmltools 0.3.5 2016-03-21 CRAN (R 3.4.0)
httr 1.2.1 2016-07-03 CRAN (R 3.4.0)
IRanges * 2.9.13 2016-12-01 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.4.0)
jsonlite 1.1 2016-09-14 CRAN (R 3.4.0)
knitr 1.15.1 2016-11-22 CRAN (R 3.4.0)
lattice 0.20-34 2016-09-06 CRAN (R 3.4.0)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.4.0)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.0)
magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.4.0)
matrixStats 0.51.0 2016-10-09 CRAN (R 3.4.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.4.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.4.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.4.0)
pkgmaker 0.22 2014-05-14 CRAN (R 3.4.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
qvalue 2.7.0 2016-10-23 Bioconductor
R6 2.2.0 2016-10-05 CRAN (R 3.4.0)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.4.0)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.4.0)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.4.0)
recount * 1.1.7 2016-11-29 Bioconductor
registry 0.3 2015-07-08 CRAN (R 3.4.0)
rentrez 1.0.4 2016-10-26 CRAN (R 3.4.0)
reshape2 1.4.2 2016-10-22 CRAN (R 3.4.0)
rngtools 1.2.4 2014-03-06 CRAN (R 3.4.0)
rpart 4.1-10 2015-06-29 CRAN (R 3.4.0)
Rsamtools 1.27.5 2016-12-01 Bioconductor
RSQLite 1.1 2016-11-27 CRAN (R 3.4.0)
rtracklayer 1.35.1 2016-10-29 Bioconductor
S4Vectors * 0.13.5 2016-12-01 Bioconductor
scales 0.4.1 2016-11-09 CRAN (R 3.4.0)
stringi 1.1.2 2016-10-01 CRAN (R 3.4.0)
stringr 1.1.0 2016-08-19 CRAN (R 3.4.0)
SummarizedExperiment * 1.5.3 2016-11-11 Bioconductor
survival 2.40-1 2016-10-30 CRAN (R 3.4.0)
tibble 1.2 2016-08-26 CRAN (R 3.4.0)
VariantAnnotation 1.21.10 2016-12-01 Bioconductor
withr 1.0.2 2016-06-20 CRAN (R 3.4.0)
XML 3.98-1.5 2016-11-10 CRAN (R 3.4.0)
xtable 1.8-2 2016-02-05 CRAN (R 3.4.0)
XVector 0.15.0 2016-10-23 Bioconductor
zlibbioc 1.21.0 2016-10-23 Bioconductor
>