bcgsc/transabyss

Transabyss default param setting issue

Closed this issue · 4 comments

Hi,

I'm running transabyss v 2.0.1 with default settings.
When checking the run-logs I was a little surprised to see the following message:
warning: the seed-length should be at least twice k: k=32, s=32
and indeed the default for s is set to k it says in the manual/help .
Would it therefore not be better to set the default for s to k*2 by default?

thx.

That warning message is intended for genomic assembly, to prevent assembling duplicate sequences, which is less of a concern for transcriptome assembly.

Hi @sjackman ,

ok, fair enough, thanks.
Would it make sense to change (==increase s) to avoid assembling paralogous (and/or recently duplicated) genes together?

If that were your preference, then yes, you could increase s to decrease over-assembly of paralogous sequence.

kmnip commented

The s parameter was set to k intentionally (within Trans-ABySS) during the paired-end contig assembly stages because many unitigs with length equal to k are part of the correct path for assembly into transcripts. This is particularly true for highly expressed transcripts.