/Aperture

Alignment-free detection of structural variations and viral integrations in circulating tumor DNA

Primary LanguageJavaBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Language

Aperture: Alignment-free detection of structural variations and viral integrations in circulating tumor DNA

Aperture is a new alignment-free SV caller designed for cfDNA dataset. Aperture applies a unique strategy of k-mer based searching, fast breakpoint detection using binary labels and candidates clustering to detect SVs and viral integrations in high sensitivity, especially when junctions span repetitive regions, followed by a barcode based filter to ensure specificity. Aperture takes paired-end reads in FASTQ format as inputs and reports all SVs and viral integrations in VCF 4.2 format.

If you have any trouble running Aperture, please raise an issue using the Issues tab above.

Click here to download Aperture

Software and Hardware Requirements

Software Requirements

To run Aperture, java 1.8 or later version must be installed in your system.

Hardware Requirements

  • CPU Aperture does not require or benefit from any specific modern CPU feature, but more physical cores and faster clock will significantly improve performance.
  • Memory Typically, Aperture needs 40GB in index building and 30GB in SV calling for human genome (hg19 or hg38). The exact requirement depends on many factors including reference genome, sequencing depth, cfDNA insert size and sample quality.

Running

Aperture takes a Aperture index and a set of cfDNA read files and outputs SV results in VCF format.
Pre-compiled binaries are available at https://github.com/liuhc8/Aperture/releases.

Building an Aperture index

Aperture needs a indexed sequence file (in FASTA and FAI format) and a corresponding common SNP database (in VCF format) to build Aperture index. If FAI file is missing, you can use faidx command in samtools to create one. Aperture outputs a set of 5 files with suffixes .ci .tt .km .long.km and .spaced.km. These files together constitute the index, and the original FASTA files are no longer used by Aperture once the index is built.

Human reference genome and the corresponding common SNP database can be downloaded here: hg19 hg38

Pre-built Aperture indexs for hg19 and hg38 are available here: hg19 hg38

A pre-built toy index including chr21 is available here: toy index

Command-line arguments

Usage: java -jar aperture.jar index -R <genome.fa> -V <snp.vcf> -O <out> -T <threads>
argument description
-h,--help Show help message
-O,--out Output path
-R,--reference Genome FASTA file with fai index
-T,--threads Number of threads
-V,--vcf Common SNPs database for the corresponding genome

Example

java -Xmx40g -jar fusion_test/aperture12.jar index -R hg19.fa -V dbsnp_common_hg19.vcf -O aperture_hg19 -T 30

Detecting SVs and viral integrations

Aperture needs a pair of FastQ files and an Aperture index as input. The output is in compressed VCF format (.vcf.gz). Aperture supports barcode based filter to ensure specificity. So if your dataset is produced by abundant sequencing and contains barcode as unique molecular identifier, parameters including -1BS, -2BS, -1BL, -2BL, -1S and -2S should be used to specify the location of barcodes in a read.

The following diagram gives a brief introduction to barcode-related parameters:

Image text

Command-line arguments

Usage: java -jar aperture.jar call  -1 <arg> -1BL <arg> -1BS <arg> -1S <arg> -2 <arg> -2BL <arg> -2BS <arg> -2S <arg> -D <arg> [-H] -I <arg> -P <arg> -T <arg>
Argument Description
-1,--r1 Path of R1.fq.gz
-1BL,--r1BarLen Length of barcode in R1
-1BS,--r1BarStart Barcode start index in R1 (0-based)
-1S,--r1InsStart ctDNA fragment start index in R1 (0-based)
-2,--r2 Path of R2.fq.gz
-2BL,--r2BarLen Length of barcode in R2
-2BS,--r2BarStart Barcode start index in R2 (0-based)
-2S,--r2InsStart ctDNA fragment start index in R2 (0-based)
-D,--dir Output path
-H,--help Show help message
-I,--index Path of Aperture index
-P,--project Project name
-T,--threads Number of threads

Example

curl -L https://ndownloader.figshare.com/files/26914970 --output test_bar_R1.fq.gz
curl -L https://ndownloader.figshare.com/files/26914973 --output test_bar_R2.fq.gz
curl -L https://ndownloader.figshare.com/files/26914805 --output chr21.tar.gz
tar -vxf chr21.tar.gz
java -Xmx30g -jar aperture.jar call -1 test_bar_R1.fq.gz -2 test_bar_R2.fq.gz -I hg38_small -D ./ -P test -1BS 0 -2BS 0 -1BL 8 -2BL 0 -1S 8 -2S 0 -T 4

The expected output test_toyindex_ap12.sv.vcf.gz is available in example folder of this repository.
The expected runtime of this test sample is about 15 seconds using 4 threads.

Output interpretation

In Aperture, all SVs are described as breakends and thus all the records in Aperture VCF are identified with the tag “SYTYPE=BND” in the INFO field.

Aperture VCF output follows the VCF 4.2 spec. All custom fields are described in the VCF header.

VCF FILTER Fields

ID Description
LOW_QUAL Low quality call
FAKE_BP False positive variant caused by imprecise k-mer based mapping
SMALL_EVENT Event size is smaller than the minimum reportable size

VCF INFO Fields

ID Description
SVTYPE Type of structural variant
STRANDS Strand orientation of the adjacency
REFQUA K-mer mapping quality of reference junction
VARQUA K-mer mapping quality of variant junction
REFKMER Number of k-mers supporting reference junction in average
VARKMER Number of k-mers supporting variant junction in average
BPSEQQUA Quality of sequence spanning breakpoint junction
PARID ID of partner breakend
HOMLEN Length of base pair identical micro-homology at event breakpoints
HOMSEQ Sequence of base pair identical micro-homology at event breakpoints

VCF FORMAT Fields

ID Description
GT Genotype (Not applicable)
SR Count of split reads supporting the breakpoint
PE Count of paired-end reads supporting the breakpoint
REFSR Count of split reads supporting the reference junction
VARSR Count of split reads supporting the variant junction
BAR Count of cfDNA molecules supporting the breakpoint
UBAR Count of cfDNA molecules with only one read support

Publication

For citing Aperture and for an overview of the Aperture algorithms, refer to the following article:

Aperture: alignment-free detection of structural variations and viral integrations in circulating tumor DNA. Hongchao Liu, Huihui Yin, Guangyu Li, Junling Li, Xiaoyue Wang. Brief Bioinform. 2021;bbab290. doi:10.1093/bib/bbab290

See the publication page for links of the simulation datasets.