nanoporetech/dorado

Issue with --estimate-poly-a

Closed this issue · 6 comments

Hi,

I’m encountering an issue with the pt:i tag in the BAM file output. Despite adding primers to the polyaconfig.toml file, the pt:i values don’t align as expected with the corresponding reads. I’ve tried adjusting parameters like the flank threshold and primer definitions, but the values still seem inconsistent.

I’m working with pod5 files and exploring ways to calculate polyA lengths. Would it make sense to write a Python or Bash script to calculate polyA lengths for each fastq file as a means to validate the pt:i values? Since I’m new to sequencing, I want to ensure I’m not overlooking something fundamental.

Any guidance would be greatly appreciated!

Hi @Suchi-alt,

What do you mean when you say:

the pt:i values don’t align as expected with the corresponding reads

Are you comparing these values to the actual basecalled sequence? The basecalled sequence within the polyA region is known to be inadequate, which is why dorado calculates the true polyA length in a different manner.

Yes @malton-ont , I’ve been working with the raw POD5 file input and was examining why the basecalled reads in the output seemed to differ. I adjusted the flank thresholds and the tail_interrupt_length, as the plasmid I’m working with has a homopolyA region of 120 bases. I wasn’t able to get any output using --estimate-poly-a, so I’m unsure how to validate this length as well.

@Suchi-alt,

If you're calling plasmids you should be setting plasmid_front_flank and plasmid_rear_flank rather than the primers. See the docs here.

So the sequence adjacent (at the 5' and 3' ends of polyA) will be taken as front and rear flank, and not the complementary of these adjacent sequences, right? This is what I understood from that page.

Yes, that is correct.

Thank you so much :)