/revelio

A lightweight utility for BS-seq and MethylC-seq data which applies a double-masking procedure on bisulfite alignments, facilitating variant calling with conventional software.

Primary LanguagePythonMIT LicenseMIT


This program takes an input BAM file and masks base positions that are potentially influenced by bisulfite-seq as observed from alignments between query and reference sequences. The script requires either an input FASTA sequence of the reference genome (slow) or a BAM file with MD tags (fast) as generated by 'samtools calmd'.

Requirements

The script requires Pysam which can be installed via Bioconda:

conda install -c bioconda pysam

Basic Usage

The script can run either on an input BAM file with proper MD tags such as those generated by 'samtools calmd', or by taking the reference genome fasta directly. Note that MD tags generated directly via bisulfite-aware alignment software may be inappropriate, as they sometimes follow a non-standard definition.

# using proper MD tags
samtools calmd -b input.bam genome.fa 1> calmd.bam 2> /dev/null
./revelio.py calmd.bam masked.bam

# using reference genome directly
./revelio.py -f genome.fa input.bam masked.bam