RNA-seq Analysis Pipeline

Repository for scripts used to perform the following analysis of short paired-end RNA-seq reads:

Differential expression (DE)
Gene ontology (GO) term enrichment
Gene set enrichment (GSE)

RNA-seq Analysis Pipeline

Running Scripts

The input and output paths need to be set using the inputPaths.txt and outputPaths.txt files in the InputData directory.
Be sure to read the usage notes at the beginning of the file for any script that you intend to run.

Running Scripts on Servers

To submit a BASH job script to the queue: qsub SCRIPTNAME.sh INPUT_1 ... INPUT_N
To view the jobs you have submitted and corresponding task ID numbers: qstat -u USERNAME
To delete a job from the queue: qdel TASKIDNUMBER

Running Scripts Locally

bash SCRIPTNAME.sh INPUT_1 ... INPUT_N

Alternative Method of Running Scripts Locally

To compile the script before running: chmod +x SCRIPTNAME.sh
To run a compiled trimming script: ./SCRIPTNAME.sh INPUT_1 ... INPUT_N

Resources

Required Software

FastQC: A quality control tool for high throughput raw sequence data. It generates quality reports for NGS data and gives pass/fail results for the following checks: Per base sequence quality, Per sequence quality scores, Per base sequence content, Per base GC content, Per sequence GC content, Per base N content, Sequence length distribution, Sequence duplication levels, Overrepresented sequences, Kmer content. It also has a Graphic User Interface.
Trimmomatic: A flexible read trimming tool for Illumina NGS data. It can trim adapter sequences, remove low-quality reads and bases.
HISAT2: A fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). The algorithm is based on HISAT and Bowtie2; uses a graph FM index (GFM) to index the genome before read mapping.
Tophat2: A spliced read mapper for RNA-Seq. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Bowtie2: An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Bowtie2 first extracts "seed" substrings in reads, aligns seeds in an ungapped way, and then performs extension in a gapped way.
Cufflinks: It assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. Assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It can be used in the pipeline with a protocol paper.
Cuffdiff: Differential analysis of gene regulation at transcript resolution with RNA-seq. An algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries.
Samtools: Utilities for the Sequence Alignment/Map (SAM) format. SAMtools has multiple commands for processing SAM/BAM files. The sub-command "SAMtools-flagstat" can be used to print statistics for SAM/BAM files using the FLAG field.
HTSeq-count: A package to count mapped reads for genomic features. It counts mapped reads for genomic features.
EdgeR: Empirical Analysis of Digital Gene Expression Data. It performs differential expression analysis using read counts. It uses raw count data; implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests.

ElizabethBrooks/GBCF_RNASeqAnalysisPipeline

RNA-seq Analysis Pipeline

RNA-seq Analysis Pipeline

Running Scripts

Running Scripts on Servers

Running Scripts Locally

Alternative Method of Running Scripts Locally

Resources

Required Software