COBS postprocessing slower than COBS

Question

COBS postprocessing slower than COBS

karel-brinda opened this issue 2 years ago · 2 comments

This is a note for the future.

When too many matches are reported by COBS (eg large Illumina experiments with many matches due to reads being short), the post-processing of its output (despite being very simple, essentially just removing IDS and filtering matches based on the max best hits) becomes the bottleneck.

Answer 1 · 2022-09-21T15:02:13.000Z

This is the filtering script: https://github.com/karel-brinda/mof-search/blob/08c28f8366ad35ffcb79f9953b3668494e47c38a/scripts/postprocess_cobs.py

Answer 2 · 2022-09-21T15:03:31.000Z

Idea for the future: other matches can be directly skipped after the first match is rejected here (and not necessary to repeatedly extract #kmers): https://github.com/karel-brinda/mof-search/blob/08c28f8366ad35ffcb79f9953b3668494e47c38a/scripts/postprocess_cobs.py#L38