vgteam/vg

distance index slow to compute snarls

Opened this issue · 3 comments

I think I've found a graph where using the distance index without distances is much slower than the old snarl manager.

This came up for me because I've started running path normalization vg paths -n as a part of the mc pipeline on the unclipped graphs. And on this graph it's been running for more than a day. I've had it in gdb for a few hours, and it's still on vg::fill_in_distance_index().

It also reproduces on the command line: vg snarls finishes in about 10 minutes but the vg index version's been going for hours.

cd  /private/home/hickey/dev/work/path-norm/

# slow
vg index chr2.vg.gfaffixed.clip -j chr2.vg.gfaffixed.clip.dist --snarl-limit 0

# fast
vg snarls chr2.vg.gfaffixed.clip > chr2.vg.gfaffixed.clip.snarls

@glennhickey Do you still have this graph somewhere? I think I have a fix but I want to test it

yeah it looks like I deleted that folder, but I'm 95% sure you can reproduce with this graph

/private/home/hickey/dev/work/hprc-v2-prerelease/hprc-v2.prereease-mc-chm13/hprc-v2.prereease-mc-chm13.chroms/chr2.vg

Thanks!