/ReferenceSequences.jl

Data structure for reference sequences

Primary LanguageJuliaOtherNOASSERTION

ReferenceSequences

Build Status

ReferenceSequences.jl provides a data structure for reference sequences. It is common that a reference sequence contains only five kinds of nucleotides ("ACGTN") and the occurrence of 'N' is sparse and clustered. In such a case, ReferenceSequence can compress positions of 'N' and aggressively save memory space.

julia> using Bio.Seq

julia> using ReferenceSequences

# create ReferenceSequence from DNASequence of Bio.Seq
julia> seq = ReferenceSequence(dna"ACGT"^5 * dna"N"^10)
30nt Reference Sequence:
ACGTACGTACGTACGTACGTNNNNNNNNNN

julia> DNASequence(seq)  # round trip
30nt DNA Sequence:
ACGTACGTACGTACGTACGTNNNNNNNNNN

julia> seq[4]  # access an element
T

julia> seq[15:25]  # make a subsequence (copy-free)
11nt Reference Sequence:
GTACGTNNNNN

julia> seq[1:4] == dna"ACGT"  # comparison
true

TODO

  • FASTA parser