RNA-Seq pipelines that uses HISAT2 and Kallisto for alignment (pseudo-alignment and abundance calculations in case of the latter).
The dataset downloaded was GSE120534. Once the accession list is downloaded, download the SRA toolkit (https://github.com/ncbi/sra-tools) and run the following:
prefetch --option-file SRR_Acc_List.txt
Once the files are download, retrieve the FASTQ files the following way:
for file in *; do fastq-dump --split-files "$file"; done
- Transcriptome index for Homo sapiens if Kallisto is used. Alternatively, index file can also be built using
kallisto index
. - Genome Index for Homo sapiens if HISAT2 is used. Alternatively, index file can also be built using
hisat2-build
.
-a | --aligner-to-use
: Specify 1
if you want to use Kallisto or 2
if you want to use HISAT2. DEFAULT: Kallisto
-i | --input-files-directory
: Enter the path of the directory containing FASTQ files.
-r | --reference-index
: Enter the path of the reference index. 1. If Kallisto is selected, then enter the path of indexed reference transcriptome (Ends with .idx); 2. If HISAT2 is selected, then enter the path of indexed reference genome with the prefix
-o | --output-directory
: Enter name of directory which will contain output of respective aligner.
- Run
./aligner_wrapper.py -a 1 -i <FASTQ files directory> -r <reference index directory> -o <aligner output directory>
to use Kallisto for pseudo-alginment and generate the abundance files.
Alternatively, the following bash script can be run in the directory where the input files are present:
for f in `ls *.fastq | sed 's/_[12].fastq//g' | sort -u`
do
kallisto quant -i ../reference/homo_sapiens/transcriptome.idx -o kallisto_output/${f} ${f}_1.fastq ${f}_2.fastq
done
- Run
./aligner_wrapper.py -a 2 -i <FASTQ files directory> -r <reference index directory>genome -o <aligner output directory>
to use HISAT2 for alignment.
Alternatively, the following bash script can be run in the directory where the input files are present:
for f in `ls *.fastq | sed 's/_[12].fastq//g' | sort -u`
do
hisat2 -x ../reference/grch38/genome -1 ${f}_1.fastq -2 ${f}_2.fastq -S ${f}.sam
done
- Run
./differential_expression.R -de DESeq -o <kallisto_output_directory> -m <meta_data_file> -p <pvalue_cutoff>
to import transcript abundance files from Kallisto and perform DE analysis with DESeq2.
- Run
./differential_expression.R -de Sleuth -o <kallisto_output_directory> -m <meta_data_file> -p <pvalue_cutoff>
to use Sleuth after quantifying with Kallisto.