XiaoTaoWang/NeoLoopFinder

Number of Redundant Candidates in assemble-complexSVs

alobo4 opened this issue · 4 comments

Hello,

I am running assemble-complexSVs for an HCC1954 Hi-C data, GSM3258551, SRR7475914, the same cell-line used in the NeoLoopFinder paper. The command is taking an extremely long time (>4 days) even with 256MB requested for memory as the number of redundant candidates are 84996, 141704, and 114354 for the 5kb, 10kb, and 25kb resolutions, respectively. You mentioned in your tutorial that assemble-complexSVs should only take around ~6 mins to complete. I understand that was a test example but I am curious if my numbers are expected for a bigger sample or if something messed up in previous steps. I inferred my SVs using EagleC with the NeoLoopFinder output. Here is the command I am running and what the logging file shows:
assemble-complexSVs -O HCC1954 -B HCC1954.CNN_SVs.NeoLoopFinder.txt --balance-type CNV --protocol insitu --nproc 6 \ -H HCC1954-MboI-R1-filtered.mcool::resolutions/25000 \ HCC1954-MboI-R1-filtered.mcool::resolutions/10000 \ HCC1954-MboI-R1-filtered.mcool::resolutions/5000

root INFO @ 12/02/22 12:13:04:
# ARGUMENT LIST:
# Output Prefix = HCC1954
# Break Points = HCC1954.CNN_SVs.NeoLoopFinder.txt
# Minimum fragment size = 500000bp
# Cooler URI = ['HCC1954-MboI-R1-filtered.mcool::resolutions/25000', 'HCC1954-MboI-R1-filtered.mcool::resolutions/10000', 'HCC1954-MboI-R1-filtered.mcool::resolutions/5000']
# Extended Genomic Span = 5000000bp
# Balance Type = CNV
# Experimental protocol = insitu
# Number of Processes = 6
# Log file name = assembleSVs.log
root INFO @ 12/02/22 12:13:24: Current resolution: 25000
root INFO @ 12/02/22 12:13:24: Calculate the global average contact frequencies at each genomic distance ...
root INFO @ 12/02/22 12:14:04: Done
root INFO @ 12/02/22 12:14:04: Filtering SVs by checking distance decay of chromatin contacts across SV breakpoints ...
root INFO @ 12/02/22 12:17:52: 296 SVs left
root INFO @ 12/02/22 12:17:52: Building SV connecting graph ...
root INFO @ 12/02/22 12:17:52: Discovering and re-ordering complex SVs ...
neoloop.assembly INFO @ 12/02/22 12:20:02: Filtering 114354 redundant candidates ...

Hi, thanks for reporting this. I recently updated NeoLoopFinder so that it can deal with smaller SVs than what we analyzed in the original paper, but didn't notice the running time complexity issue. I will take a look at this and get back to you later this week or next week.

Best,
Xiaotao

Hi, can you upgrade you NeoLoopFinder to the latest version (v0.4.3) by pip install -U neoloop and try again? In my test, I finished the job within 1hr with this version.