How does dorado handle 3D data
Opened this issue · 5 comments
In our experiment, we cross-linked chromosome fragments that may interact in three-dimensional space, basecalling with dorado and identifying methylation information. I need to split the cross-linked chromosome fragments and then compare them back to the reference genome, but how do I get the methylation information to the split fragments when I split them? Or can dorado do it together with identifying methylation information
Waiting for your reply
Thank you
This question isn't a dorado issue and is probably best asked on the Nanopore community forum.
However, if I understand your question correctly:
Dorado annotates mod-basecalled reads with the ML
and MM
tags but doesn't provide a way to split the reads into fragments.
You'll need to use other tools for post-processing such as samtools
which should preserve these tags when splitting.
Kind regards,
Rich
Dear @HalfPhoton
I think I have encountered a similar issue and would like to ask you for help.
The use of long-read sequencing technologies in the context of three-dimensional genomics, protein-DNA interactions, and simultaneous DNA methylation profiling has become achievable.
As you are aware, incorporating three-dimensional genomic information can lead to the presence of a significant number of chimeric reads in the sequencing results. These chimeras can typically be effectively aligned to the reference genome through split mapping, where the split positions correspond to the digest sites or added linker sequences. However, due to the inherent accuracy issues associated with ONT base calling, splitting at the read level may not always be the optimal approach. Instead, splitting based on the alignment results relative to the reference genome might yield better outcomes.
Given above context, I would like to ask whether BAM files generated from the base calling and alignment process using dorado
can be directly used for quantitative methylation analysis with modkit
.
Thank you for your time, and I look forward to your insights on this matter.
Best regards,
Zheng zhuqing
Hi @biozzq,
I would like to ask whether BAM files generated from the base calling and alignment process using dorado can be directly used for quantitative methylation analysis with modkit.
Yes - the output from dorado can be used for methylation analysis in modkit
.
However, due to the inherent accuracy issues associated with ONT base calling, splitting at the read level may not always be the optimal approach. Instead, splitting based on the alignment results relative to the reference genome might yield better outcomes.
We're always working on improving basecalling accuracy and performance in Dorado to meet the needs of our users. I'll raise this use case with the team to discuss how we can better support these interesting workflows especially with regards to read splitting if it's problematic in three-dimensional genomics.
Best regards,
Rich
Dear @HalfPhoton
Thank you for your prompt response. I have a few more uncertainties that I would like to consult with you about. I obtained the modBAM file through the dorado
process using the following command: dorado-0.7.3-linux-x64/bin/dorado basecaller dna_r10.4.1_e8.2_400bps_sup@v4.2.0 ./pod5_pass --modified-bases 6mA 5mC_5hmC --reference Hg38.fa | samtools view -bhS > output.bam
. Here, I have attached a portion of the alignments in the file subset.bam.zip
. Upon examining the methylation information within it, I found that the supplementary alignment records do not have MM and ML tags. For example, the alignment results for the read named "c59f6589-a8c9-4091-9e8e-3afca66085b5". Can modkit
accurately assess the methylation information for those supplementary alignment target regions?
subset.bam.zip
Thank you for your assistance.
Best regards,
Zheng zhuqing
Modification information is stripped from secondary and supplementary alignments unless soft clipping is enabled via the --mm2-opts "-Y"
option.