BioJulia/BioSequences.jl

Converting a LongSubSeq to LongSequence can give weird results

ian-small opened this issue · 1 comments

Expected Behavior

LongSequence -> LongSubSeq -> LongSequence should generate a copy of the original sequence

Current Behavior

Under some circumstances, converting a LongSubSeq to a LongSequence generates weird results; specifically some bases at the start of the sequence are altered to ambiguity codes

Steps to Reproduce (for bugs)

julia> test = dna"CATTTTTTTTTTTTTTT"
17nt DNA Sequence:
CATTTTTTTTTTTTTTT

julia> testview = LongSubSeq(test, 1:17)
17nt DNA Sequence:
CATTTTTTTTTTTTTTT

julia> LongSequence(testview)
17nt DNA Sequence:
YATTTTTTTTTTTTTTT

This is a simple example; if test is a longer sequence, more ambiguous bases are present in the final sequence.
As far as I have seen so far, to get this bug(?), the sequence view must start from the first position of the sequence, and the view must be more than 16 bases.

Context

I was writing a simple short-read assembler and generating a de Bruijn graph based on views into the reads and then generating contigs by extending from the first view; done like this, every created contig starts with a slew of ambiguity codes.

Your Environment

  • Package Version used: BioSequences v3.1.0
  • Julia Version used: 1.8.0
  • Operating System and version (desktop or mobile): MacOS 12.6.1

Thanks for the bug report, @ian-small. This is fixed in #261. A new patch version with the fix will be released in ~15 minutes. It may take an hour or so before the update reaches the package servers, but then you should be able to update to 3.1.2 and get the fix.

If you want it to work in the meantime, you can also simply port the 1-line fix from #261 in your code and dev BioSequences.