VCF positions for precise SV calls
Status: In progress
Overview
This is guide to clarify the usage of positions for POS, END, and MATE_POS fields for structural variants inferred at basepair resolution and rendered to VCF 4.1 (or higher). The VCF spec leaves many clues on this topic but there is considerable room for off-by-one errors at VCF generation and parse stages.
Conventions used in this guide
'Left' and 'Right'
All discussions below are based on the forward DNA strand, with "left" used to denote positions with a lower reference coordinate relative to "right" positions.
Breakend Homology
Most SVs will have some amount of breakpoint homology, its representation is reasonably well defined in the VCF spec already. The guide below follows the NGS community convention of standarizing variants by describing the local breakend standardized to the left-most position within its homology range.
Examples
All examples are drawn from public Platinum Genomes data for NA12878
Exact VCF position examples for common SV types:
Deletions (using VCF symbolic <DEL> allele)
Left end
POS indicates right-most position before crossing the left-shifted deletion left breakend:
Right end
END indicates right-most position before crossing the left-shifted deletion right breakend