/sra-to-sample

Simple script to download and organize SRA files to samples

Primary LanguagePython

sra-to-sample downloads SRA files, extracts the reads, merges multiple runs and organizes them in a nice directory structure. The input to the script is a JSON file which can be created manually or extracted from the GEO SOFT files using geo-soft-to-json.py

Note that when multiple SRA codes are associated with a sample, as in the first line of the example JSON, the files are concatenated in one fastq file. Files reads.2.fastq.gz are only created if paired-end sequencing was used.

For example:

{ "samples":[
    {"id": "sample_id_1", "name": "sample_name_1", "sra": ["SRRXXXXX", "SRRXXXXX"]},
    {"id": "sample_id_2", "name": "sample_name_2", "sra": ["SRRXXXXX"]}
]}

results in the following directory structure

sample_name_1
  |-fastq/reads.1.fastq.gz
  |-fastq/reads.2.fastq.gz
  README
sample_name_2
  |-fastq/reads.1.fastq.gz
  |-fastq/reads.2.fastq.gz
  README