/mirna-pipeline

MicroRNA-seq pipeline. Raw fastq files in. Analysis and figures of known miRNA out.

Primary LanguagePython

mirna-pipeline

This tool serves as a preproccessing pipeline from raw miRNAseq fastq files to the RBioMIR suite of analysis tools producing analysis of known miRNA in publication ready figures.

Screenshot

Installation

Clone the repository and use a python3 virtualenv to install python requirements.

>>> git clone https://github.com/liamhawkins/mirna-pipeline.git
>>> cd mirna-pipeline
>>> virtualenv venv -p $(which python3)
>>> source venv/bin/activate
>>> pip install -r requirements.txt

This pipeline also requires the following programs to be installed on your system:

Program Version Tested
fastqc 0.10.1
fastq-mcf 1.05
cutadapt 1.17
bowtie-build 1.0.0
bowtie 1.0.0
samtools 1.3.1
Rscript 3.5.2

Usage

Create a config file (See example_config.ini for exact template) for each set of analysis you wish to process.

The pipeline can then be run from the command line:

>>> pypipeline.py --config example_config.ini

Process and analyze multiple data sets

Multiple config files defining multiple analysis can be run in sequence by supplying a directory containing config *.ini files:

>>> pypipeline.py --config-dir dir_containing_configs/

In this case it is useful to suppress user prompts with the --no-prompts flag:

>>> pypipeline.py --no-prompts --config-dir dir_containing_configs/

Performing analysis only

If read counts are already available, you can perform the R analysis only using the --analysis-only flag:

>>> pypipeline.py --config example_config.ini --analysis-only dir_with_readcounts/

Readcount file names need to be in the following format: <sample_name_from_config>_MATURE.read_count.txt

All command line arguments

A full list of command line options can be found using the help flag:

>>> pypipeline.py --help
usage: pypipeline.py [-h] [-c <config_file> | -d <config_dir>] [--no-prompts]
                     [--no-fastqc] [--delete]
                     [--no-analysis | --analysis-only <read_count_dir>]

optional arguments:
  -h, --help            show this help message and exit
  -c <config_file>, --config <config_file>
                        Path to config file
  -d <config_dir>, --config-dir <config_dir>
                        Directory containing config files
  --no-prompts          Suppress user prompts
  --no-fastqc           Do not perform fastqc on raw files
  --delete              Delete intermediate processing files
  --no-analysis         Do not perform R analysis
  --analysis-only <read_count_dir>
                        Run analysis only on read counts in supplied directory

LICENSE

Link