To download and install GEOfastq
:
install.packages('remotes')
remotes::install_github('alexvpickering/GEOfastq')
GEOfastq
can use aspera
connect to download
fastqs. It is faster than ftp for large single-file downloads (single-cell
fastqs).
To download and install it according to the
documentation. For me
(Fedora 30), this works:
wget https://download.asperasoft.com/download/sw/connect/3.9.6/ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.tar.gz
tar -zxvf ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.tar.gz
./ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.sh
I also had to make sure ascp
was on the the PATH
:
echo 'export PATH=$HOME/.aspera/connect/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
For Rstudio to find ascp
on the PATH
, I also had to add this to a .Renviron:
echo 'PATH=${HOME}/.aspera/connect/bin:${PATH}' >> ./Renviron
After restarting Rstudio, to confirm things are set up properly:
# should have the above path added
Sys.getenv('PATH')
# should print info about Aspera Connect
system2('ascp', '--version')
To install GEOfastq
and Aspera Connect from a pre-built docker image:
# retrieve pre-built geofastq docker image
docker pull alexvpickering/geofastq
# run interactive container with host portion of
#`-v host:container` mounted where you want to persist data to
sudo docker run -it --rm \
-v /srv:/srv \
geofastq /bin/bash
First crawl a study page on GEO to get study metadata and corresponding fastq.gz download links on ENA:
library(GEOfastq)
gse_name <- 'GSE117570'
#' gse_text <- crawl_gse(gse_name)
#' gsm_names <- extract_gsms(gse_text)
#' srp_meta <- crawl_gsms(gsm_names)
Next, subset srp_meta
to samples that you want, then download:
srp_meta <- srp_meta[srp_meta$source_name == 'Adjacent normal', ]
# bump download time for utils::download.file
options(timeout=1e6)
get_fastqs(srp_meta, data_dir = tempdir())
That's all folks! GOTO: kallisto
?