BioJulia/SequenceVariation.jl

Feature: Create functions to get reference and alternate bases from `Variation`

MillironX opened this issue · 0 comments

Expected behavior

There should be two new functions refbases(v::Variation{S,T}) and altbases(v::Variation{S,T}) which will return the reference or alternate bases of v as a S where {S <: BioSequence}. These functions should follow the VCF specification for representing alternates:

For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event ..., unless the event occurs at position 1 on the contig in which case it must include the base after the event

From the spec examples:

  1. refbases(Variation(dna"ATCGA", "C3G")) == dna"C"
  2. altbases(Variation(dna"ATCGA", "C3G")) == dna"G"
  3. refbases(Variation(dna"ATCGA", "Δ3-3")) == dna"TC"
  4. altbases(Variation(dna"ATCGA", "Δ3-3")) == dna"T"
  5. refbases(Variation(dna"ATCGA", "3A")) == dna"C"
  6. altbases(Variation(dna"ATCGA", "3A")) == dna"CA"

Current behavior

There is none

Possible implementation

This should be fairly straightforward using leftposition and rightposition functions.

Context

These functions provide a snapshot of what changed in a Variation. They will allow trivial export to VCF format if interchange is required (my primary use case).