nanoporetech/remora

Regarding m6A modified basecalling

Closed this issue · 7 comments

Hi,

I was trying to do m6A modified basecalling of data generated from R9 flowcell by using trained RNN model i.e. res_dna_r941_min_modbases-all-context_v001 by using guppy v3.5.1.
But, unfortunately, I was unable to do modified basecalling. The basecalling was completed, but unfortunately the bam file generated after minimap2 was without any modifications. It would be nice to have your insights on the same or is there a alternative way to do modified basecalling for m6A.

Thanks

Minimap2 requires that the modified base tags be carried through the alignment stage. This can be completed with the -y option to minimap. If this does not resolve the issue, could you post the full set of commands from the raw data through to the desired output?

Hello @marcus1487
the -y option with minimap did not work out.
The full set of commands are as follows:
guppy_basecaller -i -s -c ~/workspace/rerio/basecall_models/res_dna_r941_min_modbases-all-context_v001.cfg --recursive --calib_reference ./PlasmoDB-61_PvivaxSal1_Genome.fasta -x cuda:0 -m ~/workspace/rerio/basecall_models/res_dna_r941_min_modbases-all-context_v001.jsn --barcode_kits "EXP-NBD104"
cat fastq_* > bigfile.fastq
minimap2 -t 8 -ay -x map-ont bigfile.fastq -o CE.sam
samtools fixmate -O bam,level=1 -m CE.sam fixmate.bam
samtools sort -l 1 -@8 -o pos.srt.bam -T a_prefix fixmate.bam
samtools markdup -O bam,level=1 pos.srt.bam markdup.bam
samtools view -@8 markdup.bam -o final.bam
samtools index final.bam
modkit pileup final.bam pileup.bed

Can you check at which stage you are losing the modified base calls starting with the output of guppy? Methods for confirming modified base tags can be found in the related thread on modkit. nanoporetech/modkit#159

@PRIYANKA-22091995 Once you know which step is dropping the tags, I can advise you on how to use modkit repair to potentially replace the tags. Here are two awk commands that should pull out the relevent text:

awk 'match($0,/Mm:Z.*/) {print substr($0,RSTART,RLENGTH)}' ${file} | cut -f 1
awk 'NR %4 == 1 && $0 ~ /Mm:Z:/'

Hello @ArtRand the two awk commands was run on fastq files and it gave no output.

Thanks

This indicates that the modified base calls are not found in the files tested. Please check that the original basecalls contain modified base calls or modify the original basecalling step to include modified base output.

Closing due to inactivity