Given a GEO or SRA accessions, geofetch
can 1) download either raw or processed data from either GEO or SRA and 2) produce a standaridized PEP sample annotation sheet of public metadata. This makes it really easy to run looper-compatible pipelines on public datasets by handling data aquisition and metadata formatting and standaridazation for you.
This project is still pre-release, but it is completely functional. However, some things may change in the near future.
geofetch has two components:
geofetch/geofetch.py
- A python script that downloads metadata and produces PEP-compatible sample annotation files, and downloads.sra
files (or processed data from GEO if requested).sra_convert/sra_convert.py
- A pypiper pipeline that converts SRA files into BAM files.
-
Set environment variables for
$SRARAW
(where.sra
files will live) and$SRABAM
(where.bam
files will live).geofetch
will use these environment variables to automatically know where to store the.sra
and.bam
files. -
Download SRA data using
geofetch.py
. You run it like:geofetch.py -i GSE#####
This will download all
.sra
files into your$SRARAW
folder. To see full options, see the help menu with:geofetch.py -h
This will also produce a sample annotation sheet (currently called
annocomb_GSE#####.csv
in your$SRAMETA
folder), which is what you will use as part of your PEP. -
With
.sra
data downloaded, we now need to convert these files into a more usable format (.bam
). Build a configuration file (seesra_convert/example/project_config.yaml
for example) and point thesample_annotation
to the annotation file produced by earliergeofetch.py
. -
Run the
sra_convert
pipeline usinglooper
by running this command:
looper run project_config.yaml --lump