(In development, don't install)
The following programs should be installed if you want to align RNAseq data using ezRNAseq:
- GNU Parallel: GNU parallel is a shell tool for executing jobs in parallel using multiple cores in a computer or multiple computers.
- STAR software for RNA sequencing data alignment
- SAMtools: SAM Tools provide various utilities for manipulating alignments data in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
required for the function ez_report()
https://github.com/rstudio/rmarkdown/blob/master/PANDOC.md
cp -r /usr/lib/rstudio-server/bin/pandoc/ ~/bin/pandoc/
cp -r /usr/lib/rstudio-server/bin/pandoc/pandoc-citeproc ~/bin/pandoc/
Add this to your profile
export PATH=$PATH:~/bin/pandoc
export PATH=$PATH:~/bin/pandoc-citeproc
Time ~40 min
library(ezRNAseq)
install_igenome("hg19", result_dir = "~/genomes")
~ 45 min
fasta_file = "~/genomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa"
gtf_file = "~/genomes/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf"
starIndex_dir = "~/genomes/Homo_sapiens/UCSC/hg19/Sequence/StarIndex"
create_star_index(fasta_file, gtf_file,
thread = 25, starIndex_dir = starIndex_dir)
fasta_file and gtf_file are the path to fasta and gtf files respectively.
- Mapping to reference genome is done using STAR.
- Required program: STAR & samtools
- Required external data: reference genome
Outputs:
- Name sorted BAM files
- Raw count data
- Normalized count data using DESeq2