/AlignPip-crossBBR

RNA-seq alignment pipeline for SRA data from formatted sampleTable

Primary LanguageShell

AlignPip-crossBBR

Bulk RNA-Seq alignment + quantification pipeline using STAR for alignment and featureCounts-Rsubread for quantification

Design

Sample tables which contain the SRA accessions and read info (paired/single end) with replicates separated by ";" were curated. This pipeline accepts such input and processes through the following steps:

  1. Downloads the accessions and writes the fastq files.
  2. Aligns the fastq files to the GRCh38 Homo Sapiens genome - Release 107 / Primary Assembly generated through STAR.
  3. Reports the counts extracted by featureCounts on Rsubread.

Running the pipeline

Assumptions

  1. The 26th column of the sampleTable (indexing starts from 1) should contain the SRA run accessions (SRRXXXXXX) replicates separated by ";"
  2. One column within the sampleTable should contain the read type info (factor with levels SINGLE or PAIRED)

Steps

  1. Install mamba

  2. Clone the repository

       git clone https://github.com/zgr2788/AlignPip-crossBBR.git
    
  3. Put the sampleTable.csv you would like to use within the main directory

  4. Run make and follow the steps

  5. Adjust settings through config.yaml

  6. (Optional) Run dag.sh to get a directed acyclic graph (DAG) of the jobs

  7. Set up all cluster variables in pip.sh, delete all module load statements from Modules/SRActions/Snakefile && Modules/Align/Snakefile. This step is necessary as the pipeline was originally meant to be run on the TOSUN Cluster at Sabancı.

  8. Fastq files should be downloaded.gz format

Option 1

downloadTable{Layout}.sh scripts are highly recommended if Aspera Connect is installed. With Aspera installed, do the following:

		bash Modules/SRActions/downloadTable{Layout}.sh {path/to/runlist} {path/to/sshkey}

If Aspera fails for downloads, failed{layout.txt} files will be generated for ease.

Option 2

The Modules/SRActions/fastqWrite.sh file needs to be configured to give a conda environment with parallel-fastq-dump and also a local install of sra-toolkit. The reason being that the current conda install of parallel-fastq-dump does not install an updated sra-toolkit.

Appendices

Sample DAG

Notes

A total of 323 experiments were downloaded with this pipeline

156 PAIRED
15  SC (Single Fastq File)
152 SINGLE

To reproduce the alignments, use STAR version 2.7.0 and Rsubread version 2.8.2 with the sampleTable provided.