marbl/MHAP

make all subreads(in one ZMW) to one ccs sequence before use MHAP

Closed this issue · 2 comments

hi,
When PBcR map all filtered_subread to SEEDS longreads with MHAP, the SEEDS usually still contain other subread which are in the same ZMW.
So what if roll all subreads which are in the same ZMW to one CCS sequence for SEEDS,then combine CCS and other seeds as SEEDS,finally use MHAP to do overlap?

Hi,

I think you're asking about the full PBcR pipeline rather than just MHAP. The default pipeline will not remove any reads so yes, multiple reads split from the same ZMW can be corrected. There are later steps in the assembly which removes duplicate sequences.

You can remove all but the longest read from a ZMW if you want but I don't think it's worth preprocessing them to generate a CCS sequence. You will most likely have regions of 1X coverage where the identity won't improve and PBcR/MHAP is designed to overlap to the raw accuracy sequence so a single sequence will still be corrected.

I will try to remove duplication .Thanks!