Predict 5mC in PacBio HiFi reads
Primrose predicts 5-Methylcytosine (5mC) of each CpG in PacBio HiFi reads, using a Convolutional Neural Network. Methylation is assumed to be symmetric between strands. The output is reported in the forward direction with respect to the HiFi read sequence.
Latest version can be installed via bioconda package primrose
.
Please refer to our official pbbioconda page for information on Installation, Support, License, Copyright, and Disclaimer.
Version 1.3.0: Full changelog here
Input for primrose are PacBio HiFi reads with kinetics. You can generate HiFi with kinetics on the command-line, more info on ccs.how:
ccs movie.subreads.bam movie.hifi_reads.bam --hifi-kinetics
Alternatively, you can use SMRT Link on your HPC or define it directly in Run Design for SQIIe instruments.
Running primrose is as simple as:
primrose movie.hifi_reads.bam movie.5mc.hifi_reads.bam
The output is adhering to the SAM tag specification from 9. Dec 2021,
using Mm
and Ml
tags. It's also described in the PacBio BAM file formats as
Tag | Type | Description |
---|---|---|
MM |
Z |
Base modifications / methylation |
ML |
B,C |
Base modification probabilities |
Notes for ML
: The continuous probability range of 0.0 to 1.0 is remapped to
the discrete integers 0 to 255 inclusively. The probability range corresponding
to an integer N is N/256
to (N + 1)/256
.
primrose scales nearly linear in the number of threads, achieving 2 GBases HiFi per minute on 16 cores. Memory footprint is very low with ~20 MB per thread.
$ primrose movie.hifi_reads.bam out.bam -j 16 --log-level INFO
Reads : 100000
Yield : 1.8 GBases
Throughput : 2.0 GBases/min
Run Time : 54s 904ms
CPU Time : 16m 52s
Peak RSS : 0.313 GB
HiFi reads and subreads for true negative and true positive CpG methylation sites are available at https://downloads.pacbcloud.com/public/Sequel-II-CpG-training/.
The true negatives are from HG002 Whole Genome Amplification (WGA). The true positives are from HG002 WGA + CpG Methyltransferase (M.Sssl).
-
1.3.0
- Latest developer version
-
1.2.0
- Included in upcoming SMRT Link version
- Use official basemod tags
MM
andML
per SAM spec Feb 2022
-
1.1.0
- Add CLI call into new
@PG
header lines - Allow multiple
BAM
files viaXML
input
- Add CLI call into new
-
1.0.0
- Initial release