/NCIG-ONT-SV

The landscape of genomic structural variation in Indigenous Australians

Primary LanguageR

NCIG0-ONT-SV

The landscape of genomic structural variation in Indigenous Australians

Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1,2,3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here, we apply population-scale whole genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large indels (20-49bp; n=136,797), structural variants (SVs; 50bp-50kb; n=159,912) and regions of variable copy-number (>50kb; n=156). The majority are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of SVs appear to be exclusive to Indigenous Australians (12% lower bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short-tandem repeats (STRs) throughout the genome to characterise allelic diversity at fifty known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among STR sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.

Citation

DOI