/GEOfastq

Quickly download fastq files associated with a GEO series

Primary LanguageROtherNOASSERTION

GEOfastq

Install GEOfastq

To download and install GEOfastq:

install.packages('remotes')
remotes::install_github('alexvpickering/GEOfastq')

Install Aspera Connect (optional)

GEOfastq can use aspera connect to download fastqs. It is faster than ftp for large single-file downloads (single-cell fastqs). To download and install it according to the documentation. For me (Fedora 30), this works:

wget https://download.asperasoft.com/download/sw/connect/3.9.6/ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.tar.gz
tar -zxvf ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.tar.gz
./ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.sh

I also had to make sure ascp was on the the PATH:

echo 'export PATH=$HOME/.aspera/connect/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

For Rstudio to find ascp on the PATH, I also had to add this to a .Renviron:

echo 'PATH=${HOME}/.aspera/connect/bin:${PATH}' >> ./Renviron

After restarting Rstudio, to confirm things are set up properly:

# should have the above path added
Sys.getenv('PATH')

# should print info about Aspera Connect
system2('ascp', '--version')

Install docker image

To install GEOfastq and Aspera Connect from a pre-built docker image:

# retrieve pre-built geofastq docker image
docker pull alexvpickering/geofastq

# run interactive container with host portion of 
#`-v host:container` mounted where you want to persist data to
sudo docker run -it --rm \
  -v /srv:/srv \
  geofastq /bin/bash

Usage

First crawl a study page on GEO to get study metadata and corresponding fastq.gz download links on ENA:

library(GEOfastq)

gse_name <- 'GSE117570'
#' gse_text <- crawl_gse(gse_name)
#' gsm_names <- extract_gsms(gse_text)
#' srp_meta <- crawl_gsms(gsm_names)

Next, subset srp_meta to samples that you want, then download:

srp_meta <- srp_meta[srp_meta$source_name == 'Adjacent normal', ]

# bump download time for utils::download.file
options(timeout=1e6)

get_fastqs(srp_meta, data_dir = tempdir())

That's all folks! GOTO: kallisto?