distance index slow to compute snarls
Opened this issue · 3 comments
glennhickey commented
I think I've found a graph where using the distance index without distances is much slower than the old snarl manager.
This came up for me because I've started running path normalization vg paths -n
as a part of the mc pipeline on the unclipped graphs. And on this graph it's been running for more than a day. I've had it in gdb for a few hours, and it's still on vg::fill_in_distance_index()
.
It also reproduces on the command line: vg snarls
finishes in about 10 minutes but the vg index
version's been going for hours.
cd /private/home/hickey/dev/work/path-norm/
# slow
vg index chr2.vg.gfaffixed.clip -j chr2.vg.gfaffixed.clip.dist --snarl-limit 0
# fast
vg snarls chr2.vg.gfaffixed.clip > chr2.vg.gfaffixed.clip.snarls
xchang1 commented
@glennhickey Do you still have this graph somewhere? I think I have a fix but I want to test it
glennhickey commented
yeah it looks like I deleted that folder, but I'm 95% sure you can reproduce with this graph
/private/home/hickey/dev/work/hprc-v2-prerelease/hprc-v2.prereease-mc-chm13/hprc-v2.prereease-mc-chm13.chroms/chr2.vg
xchang1 commented
Thanks!