marbl/verkko

ONT&HiFi's assembly with several Ns gap

Closed this issue · 1 comments

hq66 commented

Hi~
Verkko is a user-friendly and powerful software. I used ONT, HiFi and Hi-C to assemble, and then used 3D-DNA to anchor contigs into chromosomes. When I count the Ns to detect gaps in the final assembly, I found some Ns with irregular length, such as 1000bp, 25065bp 111741bp and so on. Because 3D-DNA usually use Ns in length of 500 to link contigs without overlaps, so other length seems irregular, and their position in hic matrix of juicebox were inside contigs, which means they may weren't regarded as gaps in juicer. I wonder how the irregular length of Ns generated and what did they mean. Should they be regarded as gaps?
Thanks.

skoren commented

When verkko has phasing information (like Hi-C or trio), it can generate scaffolds when the resolution is ambiguous but long range structure is clear. The size of the gap in this case is the estimate of the sequence that could not be resolved. In some other cases, there may be a gap in one haplotype but not in the other. Here, the gap size will be the estimate of missing sequence based on the other haplotype. In some cases, a fixed gap of 5kb is used when an estimate is not possible. This is described in more detail in the verkko manuscript: https://www.nature.com/articles/s41587-023-01662-6.

The latest version of verkko (1.4.1+) will have some information on the reason for the gap in the 8-hic*/*.paths.tsv like so:

haplotype2_from_utig4-1296. utig4-1296-,[N10591N:ambig_bubble],utig4-3142- HAPLOTYPE2
haplotype2_from_utig4-1990  utig4-1990-,[N5000N:ambig_path],utig4-3120+  HAPLOTYPE2
haplotype2_from_utig4-223   utig4-223-,[N44884N:alt-utig4-504],utig4-282- HAPLOTYPE2

and you can view the context around the gap in the noseq.gfa file output by verkko.