The spliting of contigs into _long and _short should only be performed when needed, or at least it should be skippable

Question

The spliting of contigs into _long and _short should only be performed when needed, or at least it should be skippable

Closed this issue 5 months ago · 1 comments

Description of feature

Hello!

As the title says, the whole logic behind splitting reads into _long and _short should only be done when needed, i.e. when --run_bgc_screening is enabled. Otherwise, the samples should be kept whole. As of now, there is currently no way of skipping this behavior.

As an aside, the renaming of samples with the suffixes _long and _short isn't ideal, as those suffixes end up appearing in all of the reports generated by the pipeline.

Ideally, the contig filtering should ONLY be done when needed (i.e. specifically for the BGC tools that fail with short contigs). In those cases, for the problematic tools, the filtered long contigs should simply replace the corresponding original sample, so no sample renaming would actually be required. I hope this makes sense.

Answer 1 · 2024-06-17T06:16:35.000Z

Change of approach and removal of suffixes done in #381