Bisulfite Sequencing Pipeline

A Nextflow pipeline to align and quantify Methylation (Bisulfite) sequencing data.

The pipeline was created to run on the ETH Euler cluster and it relies on the server's genome files. Thus, the pipeline needs to be adapted before running it in a different HPC cluster.

Pipeline steps

FastQC
FastQ Screen
Trim Galore
FastQC
Bismark
Bismark filter non-conversion [Optional]
Bismark deduplication
Bismark methylation extractor
coverage2cytosine [Optional]
Bismark2report
Bismark2summary
MultiQC

Required parameters

Path to the folder where the FASTQ files are located.

--input /cluster/work/nme/data/josousa/project/fastq/*fastq.gz

Output directory where the files will be saved.

--outdir /cluster/work/nme/data/josousa/project

Genomes

Reference genome used for alignment.

--genome

Available genomes:

    GRCm39 # Default
    GRCm38
    GRCh38
    GRCh37 
    panTro6
    CHIMP2.1.4
    BDGP6
    susScr11
    Rnor_6.0
    R64-1-1
    TAIR10
    WBcel235
    E_coli_K_12_DH10B
    E_coli_K_12_MG1655
    Vectors
    Lambda
    PhiX
    Mitochondria

Option to use a custom genome for alignment by providing an absolute path to a custom genome file.

--custom_genome_file '/cluster/work/nme/data/josousa/project/genome/CHM13.genome'

Example of a genome file:

name           GRCm39
species        Mouse
bismark        /cluster/work/nme/genomes/Mus_musculus/Ensembl/GRCm39/Sequence/BismarkIndex/

FastQ Screen optional parameters

Option to provide a custom FastQ Screen config file.

--fastq_screen_conf '/cluster/work/nme/software/config/fastq_screen.conf' # Default

Bismark optional parameters

Option to set the alignment mode to local.

--local

In this mode, it is not required that the entire read aligns from one end to the other. Rather, some characters may be omitted (“soft-clipped”) from the ends in order to achieve the greatest possible alignment score.
Option to write all reads that could not be aligned to a file in the output directory.

--unmapped
Option to write all reads which produce more than one valid alignment with the same number of lowest mismatches or other reads that fail to align uniquely to a file in the output directory.

--ambiguous

Skipping and adding options

Option to skip FastQC, TrimGalore, and FastQ Screen. The first step of the pipeline will be the Bismark alignment. --skip_qc
Option to skip FastQ Screen. --skip_fastq_screen
Option to skip Bismark deduplication. --skip_deduplication
Option to add Bismark filter non-conversion before deduplication (if selected) and before Bismark methylation extractor. --add_filter_non_conversion

Extra arguments

Option to add extra arguments to FastQC. --fastqc_args
Option to add extra arguments to FastQ Screen. --fastq_screen_args
Option to add extra arguments to Trim Galore. --trim_galore_args
Option to add extra arguments to Bismark. --bismark_arg
Option to add extra arguments to Bismark filter non-conversion. --filter_non_conversion_args
Option to add extra arguments to Bismark deduplication. --deduplicate_bismark_args
Option to add extra arguments to Bismark methylation extractor. --bismark_methylation_extractor_args
Option to add extra arguments to Bismark coverage2cytosine. --coverage2cytosine_args
Option to add extra arguments to Bismark2summary. --bismark2summary_args
Option to add extra arguments to Bismark2report. --bismark2report_args
Option to add extra arguments to MultiQC. --multiqc_args

Acknowledgements

This pipeline was adapted from the Nextflow pipelines created by the Babraham Institute Bioinformatics Group and from the nf-core pipelines. We thank all the contributors for both projects. We also thank the Nextflow community and the nf-core community for all the help and support.