karel-brinda/Phylign

COBS postprocessing slower than COBS

karel-brinda opened this issue · 2 comments

This is a note for the future.

When too many matches are reported by COBS (eg large Illumina experiments with many matches due to reads being short), the post-processing of its output (despite being very simple, essentially just removing IDS and filtering matches based on the max best hits) becomes the bottleneck.

image

Idea for the future: other matches can be directly skipped after the first match is rejected here (and not necessary to repeatedly extract #kmers): https://github.com/karel-brinda/mof-search/blob/08c28f8366ad35ffcb79f9953b3668494e47c38a/scripts/postprocess_cobs.py#L38