/ramdaq

Primary LanguageNextflowMIT LicenseMIT

ramdaq

This pipeline analyses data from full-length single-cell RNA sequencing (scRNA-seq) methods.

GitHub Actions CI Status GitHub Actions Linting Status Nextflow

Introduction

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Pipeline summary

overall view

  1. Read QC (FastQC)
  2. Adapter and quality trimming (FastqMcf)
  3. Trimmed read QC (FastQC)
  4. Sort and index alignments (Hisat2 and SAMtools)
  5. Quantification of gene-level and transcript-level expression (RSEM)
  6. Generation of BigWig (coverage) files (bam2wig)
  7. Mapping/alginment QC:
    • RSeQC
    • readcoverage.jl
  8. Quantification of gene-level expression (featureCounts)
  9. Quantification of rRNA reads (HISAT2 and SAMtools)
  10. Alignment and quantification of SIRV reads (HISAT2, SAMtools, and RSEM) (optional)
  11. HTML QC report for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)

Quick Start

i. Install nextflow

ii. Install either Docker or Singularity for full pipeline reproducibility (see docs). Note that ramdaq does not support conda.

iii. Download the pipeline automatically and test it on a minimal dataset with a single command

1. Example of test using Docker

nextflow run rikenbit/ramdaq -profile test,docker

2. Example of test using Singularity

nextflow run rikenbit/ramdaq -profile test,singularity

iv. Start running your own analysis!

iv-i. You can run ramdaq without donwloading reference annotation data.

nextflow run rikenbit/ramdaq -profile <docker/singularity> --reads '*_R{1,2}.fastq.gz' --genome GRCh38_v37

iv-i. You can also run ramdaq by specifying local paths to reference annotation (See 'Using provided reference genome and annotations').

nextflow run rikenbit/ramdaq -profile <docker/singularity> --reads '*_R{1,2}.fastq.gz' --genome GRCh38_v37 --local_annot_dir <The directory path where the reference genome and annotations are placed>

See usage docs for all of the available options when running the pipeline.

Managing and handling ramdaq version

Pulling or updating ramdaq

To download or update ramdaq, run nextflow pull:

nextflow pull rikenbit/ramdaq

Checking available versions

To check the available versions, run nextflow info:

nextflow info rikenbit/ramdaq

The above command will return the message like this (* master (default) indicates that the latest version will be used when you execute nextflow run rikenbit/ramdaq ...):

$ nextflow info rikenbit/ramdaq
 project name: rikenbit/ramdaq
 repository  : https://github.com/rikenbit/ramdaq
 local path  : /Users/haruka/.nextflow/assets/rikenbit/ramdaq
 main script : main.nf
 description : This pipeline analyses data from full-length single-cell RNA sequencing (scRNA-seq) methods.
 author      : Mika Yoshimura and Haruka Ozaki
 revisions   :
 * master (default)
   dev
   1.0 [t]
   1.1 [t]

Using a specific version

To use versions other than the latest version, use -r to set the version name as follows:

nextflow run rikenbit/ramdaq -r 1.1 ...

Documentation

The ramdaq pipeline comes with documentation about the pipeline, found in the docs/ directory:

  1. Installation
  2. Pipeline configuration
  3. Running the pipeline
  4. Output and how to interpret the results
  5. Troubleshooting

Credits

ramdaq is written and maintained by Mika Yoshimura and Haruka Ozaki in the collaboration of Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research and Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba.

ramdaq was originally developed based on the nf-core template.

Citation

DOI