some metadata are not up to date?
crazyhottommy opened this issue · 6 comments
Hi,
Thanks for this useful tool:
library(ENCODExplorer)
data(encode_df, package = "ENCODExplorer")
query_results_melanocyte <- queryEncode(df=encode_df, organism = "Homo sapiens",
biosample_name = c("foreskin melanocyte"), file_format = "fastq", fixed = FALSE,
assay = "ChIP-seq")
> query_results_melanocyte
Empty data.table (0 rows) of 73 cols: accession,file_accession,file_type,file_format,file_size,output_category...
> devtools::session_info()
Session info ---------------------------------------------------------------------------------------
setting value
version R version 3.4.2 (2017-09-28)
system x86_64, darwin15.6.0
ui RStudio (1.0.153)
language (EN)
collate en_US.UTF-8
tz America/Chicago
date 2018-01-11
Packages -------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 cran (@0.2.0)
base * 3.4.2 2017-10-04 local
bindr 0.1 2016-11-13 cran (@0.1)
bindrcpp * 0.2 2017-06-17 cran (@0.2)
BiocInstaller * 1.28.0 2017-10-31 Bioconductor
bitops 1.0-6 2013-08-17 cran (@1.0-6)
compiler 3.4.2 2017-10-04 local
data.table 1.10.4-3 2017-10-27 cran (@1.10.4-)
datasets * 3.4.2 2017-10-04 local
devtools 1.13.3 2017-08-02 CRAN (R 3.4.1)
digest 0.6.12 2017-01-27 CRAN (R 3.4.0)
dplyr 0.7.4 2017-09-28 cran (@0.7.4)
DT * 0.2 2016-08-09 CRAN (R 3.4.0)
ENCODExplorer * 2.4.0 2017-10-31 Bioconductor
glue 1.2.0 2017-10-29 cran (@1.2.0)
graphics * 3.4.2 2017-10-04 local
grDevices * 3.4.2 2017-10-04 local
htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
htmlwidgets 0.9 2017-07-10 CRAN (R 3.4.1)
httpuv 1.3.5 2017-07-04 CRAN (R 3.4.1)
jsonlite 1.5 2017-06-01 CRAN (R 3.4.0)
magrittr 1.5 2014-11-22 cran (@1.5)
memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
methods * 3.4.2 2017-10-04 local
mime 0.5 2016-07-07 CRAN (R 3.4.0)
parallel 3.4.2 2017-10-04 local
pkgconfig 2.0.1 2017-03-21 cran (@2.0.1)
purrr 0.2.4 2017-10-18 CRAN (R 3.4.2)
R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
Rcpp 0.12.14 2017-11-23 cran (@0.12.14)
RCurl 1.95-4.8 2016-03-01 cran (@1.95-4.)
rlang 0.1.4 2017-11-05 cran (@0.1.4)
shiny * 1.0.5 2017-08-23 CRAN (R 3.4.1)
shinythemes * 1.1.1 2016-10-12 CRAN (R 3.4.0)
stats * 3.4.2 2017-10-04 local
stringi 1.1.6 2017-11-17 cran (@1.1.6)
stringr 1.2.0 2017-02-18 cran (@1.2.0)
tibble 1.3.4 2017-08-22 cran (@1.3.4)
tidyr 0.7.2 2017-10-16 cran (@0.7.2)
tools 3.4.2 2017-10-04 local
utils * 3.4.2 2017-10-04 local
withr 2.0.0 2017-07-28 CRAN (R 3.4.1)
xtable 1.8-2 2016-02-05 cran (@1.8-2)
but I went to the ENCODE site and can find the fastqs are there https://www.encodeproject.org/search/?type=Experiment&assay_title=ChIP-seq&target.investigated_as=histone+modification&files.file_type=fastq&biosample_type=primary+cell&biosample_term_name=foreskin+melanocyte&biosample_term_name=foreskin+melanocyte
Thank you for looking into this.
Best,
Tommy
Hello Tommy,
Thank you for your interest in ENCODExplorer!
The version of the metadata is updated before each release of Bioconductor. Which mean the version of the metadata file will tend to be outdated by the end of each cycle.
I discussed with the people at Bioconductor about this and they recommend to keep a stable version for the complete release cycle to improve reproducibility.
This being said, it is possible to download all the tables from ENCODE and produce a new database that can be used by ENCODExplorer (see the Data Update vignette).
I will also prepare a new version that I will push on this github today if possible. This way you will be able to install the github version with the latest version of the metadata:
devtools::install_github("charlesjb/encodexplorer")
I checked in the updated encode_df but I don't seem to find the biosample you are looking for. I'll investigate this further next week and will keep you up to date.
OK, I pushed the new version on github. You can install with:
devtools::install_github("CharlesJB/ENCODExplorer")
I tested your initial query, which now returns 32 files.
I will also push the new version on the development branch soon.
Thank you very much! Just FYI, some fastqs are there but not necessary open to everyone.
for this data https://www.encodeproject.org/search/?type=Experiment&assay_title=ChIP-seq&target.investigated_as=histone+modification&files.file_type=fastq&biosample_type=primary+cell&biosample_term_name=foreskin+melanocyte&biosample_term_name=foreskin+melanocyte
I can not download the fastqs, I got answer from the ENCODE DCC:
That data was generated by the Roadmap Epigenomics consortium and the raw reads are protected by dbGaP. The ENCODE DCC was granted access to the raw data through dbGaP to process and make only the pipeline results available to the community, not the raw data. We have displayed the restricted files as objects on the ENCODE portal with accurate metadata in the interest of data provenance in the case where a user does gain access to the raw data themselves.
You can inquire about access to the Costello lab-produced Roadmap data at this link: https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?adddataset=phs000791&consent=HMB&page=login …
Anyway, thanks for the update!
Tommy
Thanks for the info, I was not aware of this. I'll try to see if there is something I can do.
great and thanks again!