OMrnaseq, developed at the OnMath, is designed for analysis RNAseq data. Including 4 major modules: fastqc, mapping, quant and enrich.
- qc: using fastqc to examine basic information (data size, GC, Q30) about fastq files.
- mapping: using STAR to map reads to the genome.
- quant: using kallisto to quantifying abundances, and using edgeR to perform differential analysis.
- enrich: using goseq and KOBAS to analsyis enrichment of differential expressed gene to GO Terms and KEGG pathways.
OMrnaseq is under development, so its better to install it in a virtualenv, so you can keep up with the updating. virtualenvwrapper is a set of extensions to virtualenv tool. Its convinient to using for virtualenv management.
# install virtualenvwrapper
pip install virtualenvwrapper
# configure your bash profile
# add below two command to your ~/.bash_profile
export WORKON_HOME=/your/virturalenv/path
source /usr/bin/virtualenvwrapper.sh
# make OMrnaseq virturalenv
mkvirtualenv OMrnaseq
# enter OMrnaseq virturalenv
workon OMrnaseq
Before install OMrnaseq, you need to install omplotr and rnaReport first.
omplotr
is a R package needed for OMrnaseq to generate plots in analysis. rnaReport
is needed for OMrnaseq to generate a report.
Now you are in the virtualenv for the OMrnaseq, you can download the OMrnaseq source code from github and install it in the environment.
# download
git clone https://github.com/bioShaun/OMrnaseq.git
# install
cd OMrnaseq
pip install -e .
mrna \
-p /path/of/analysis \
-s /path/of/sample_inf \
-f /path/of/fastqs \
-w parallels_number \
fastqc
- sample_inf: tab-delimited text file indicating biological replicate relationships; see example.
- fastqs: fastq files named format sample_1.clean.fq.gz, sample_2.clean.fq.gz.
mrna \
-p /path/of/analysis \
-s /path/of/sample_inf \
-f /path/of/fastqs \
-w parallels_number \
--star_index /path/to/star/index \
mapping
mrna \
-p /path/of/analysis \
-s /path/of/sample_inf \
-f /path/of/fastqs \
-w parallels_number \
--gene2tr /gene/transcript/map/file \
--kallisto_idx /path/to/kallsito/index
- gene2tr: file containing 'gene(tab)transcript' identifiers per line; see example.
mrna \
-p /path/of/analysis \
-w parallels_number \
-n result_name \
--go /go/annotation/file \
--gene_length /gene/length/file \
--kegg_blast /gene/blast/to/kegg/pep/tab/outfile \
--kegg_abbr species_kegg_abbr \
--kegg_background species_kegg_background_abbr \ # default is kegg abbr
--gene_list_file /file/of/gene/list/path \
enrich
- go: file containing 'gene(tab)go_ids' per line, go_ids are seperated with ","; see example.
- gene_length: file containing 'gene(tab)gene_length' per line; see example.
- kegg_blast: blast result of gene with KOBAS pep sequence; see example.
- gene_list_file: file containing gene list file path. see example.
rnaseq is a collection of module: qc, quant and enrich. So you could run three module in one command.
# run rnaseq
mrna \
-p /path/of/analysis \
-s /path/of/sample_inf \
-f /path/of/fastqs \
-w parallels_number \
--gene2tr /gene/transcript/map/file \
--kallisto_idx /path/to/kallsito/index \
--go /go/annotation/file \
--gene_length /gene/length/file \
--kegg_blast /gene/blast/to/kegg/pep/tab/outfile \
--kegg_abbr species_kegg_abbr \
rnaseq
# you can also combine the module by yourself
mrna \
... # parameters required for each module
module1 \
module2