systemPipeR

platforms rank posts Bioc build updated dependencies

R-CMD-check

Introduction

systemPipeR is an R/Bioconductor package for building and running automated end-to-end analysis workflows for a wide range of research applications, including next-generation sequencing (NGS) experiments, such as RNA-Seq, ChIP-Seq, VAR-Seq and Ribo-Seq. Important features include a uniform workflow interface across different data analysis applications, automated report generation, and support for running both R and command-line software, such as NGS aligners or peak/variant callers, on local computers or compute clusters. The latter supports interactive job submissions and batch submissions to queuing systems of clusters. Efficient handling of complex sample sets and experimental designs is facilitated by a well-defined sample annotation infrastructure which improves reproducibility and user-friendliness of many typical analysis workflows in the NGS area.

Installation

To install the package, please use the BiocManager::install command:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("systemPipeR")

To obtain the most recent updates immediately, one can install it directly from github as follow:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("tgirke/systemPipeR", build_vignettes=TRUE, dependencies=TRUE)

Usage

Instructions for running systemPipeR are given in its main vignette (manual). The sample data set used in the vignette are provided by the data package systemPipeRdata. The expected format to define NGS samples (e.g. FASTQ files) and their labels are given in targets.txt and targetsPE.txt (latter is for PE reads). With the latest Bioconductor Release 3.9, we are adopting for this functionality the widely used community standard Common Workflow Language (CWL) for describing analysis workflows in a generic and reproducible manner, introducing SYSargs2 workflow control class. Using this community standard in systemPipeR has many advantages. For instance, the integration of CWL allows running sytemPipeR workflows from a single specification instance either entirely from within R, from various command-line wrappers (e.g., cwl-runner) or from other languages (, e.g., Bash or Python). The run parameters of command-line software are defined by param files that have a simplified YAML name/value structure. Here is a sample param file for Hisat2: hisat2.cwl. Templates for setting up custom project reports are provided by systemPipeRdata. The corresponding PDFs of these report templates are linked here: systemPipeRNAseq, systemPipeRIBOseq, systemPipeChIPseq and systemPipeVARseq.

WorkFlow

WorkFlow Description Version R-CMD-check
systemPipeChIPseq ChIP-Seq Workflow Template Stable R-CMD-check
systemPipeRIBOseq RIBO-Seq Workflow Template Stable R-CMD-check
systemPipeRNAseq RNA-Seq Workflow Template Stable R-CMD-check
systemPipeVARseq VAR-Seq Workflow Template Stable R-CMD-check
systemPipeMethylseq Methyl-Seq Workflow Template Experimental R-CMD-check
systemPipeDeNovo De novo transcriptome assembly Workflow Template Experimental R-CMD-check
systemPipeCLIPseq CLIP-Seq Workflow Template Experimental R-CMD-check
systemPipeMetaTrans Metatranscriptomic Sequencing Workflow Template Experimental R-CMD-check

Slides