The spliting of contigs into _long and _short should only be performed when needed, or at least it should be skippable
Closed this issue · 1 comments
Description of feature
Hello!
As the title says, the whole logic behind splitting reads into _long
and _short
should only be done when needed, i.e. when --run_bgc_screening
is enabled. Otherwise, the samples should be kept whole. As of now, there is currently no way of skipping this behavior.
As an aside, the renaming of samples with the suffixes _long
and _short
isn't ideal, as those suffixes end up appearing in all of the reports generated by the pipeline.
Ideally, the contig filtering should ONLY be done when needed (i.e. specifically for the BGC tools that fail with short contigs). In those cases, for the problematic tools, the filtered long
contigs should simply replace the corresponding original sample, so no sample renaming would actually be required. I hope this makes sense.