Structural variations (SVs) represent genomic rearrangements such as deletions, insertions, inversions, duplications, and translocations whose sizes are larger than 50bp. A number of long read SV callers have been proposed to call SVs and they perform well. However, the long reads generated by Oxford Nanopore (ONT) have high error rate, which affect the correctness of the long read alignment. Existing long read SV callers do not perform well. We propose a novel method, SVsearcher, to resolve these issues. Compared with existing methods, SVsearcher has highest recall, precision and F1-score.
git clone https://github.com/kensung-lab/SVsearcher.git
1. python3
2. pysam
3. cigar
4. numpy
5. pyfaidx
6. copy
7. time
8. argparse
The sorted bam files from NGMLR, Minimap and Minimap2 are all be used as input sorted bam. The input reference.fa and reference.fa of bam file must be the same one.
cd dist
SVsearcher <input sorted bam> <input reference.fa>
The output format is as follows. CHROM is chromosome name. POS is the SV start position. ID is the SV name. REF is the reference sequence and ALT is the alternate sequence. QUAL is the quality of SV and FILTER means filter status. INFO is the basic information of SV.
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 10780 SVsearcher.INS.1 G GAACACATGCTAGCGCGTCCGGGGGTGGAGGCGATAGCGCAGGCGCAGAGAGCGCCGCGCC . PASS SVTYPE=INS;SVLEN=61;END=10780;RNAMES=NULL
chr1 30893 SVsearcher.DEL.1 catttctctctatctcatttctctctctctcgctatct c . PASS SVTYPE=DEL;SVLEN=-37;END=30930;RNAMES=NULL
For advising, bug reporting and requiring help, please contact yan.zheng@nwpu-bioinformatics.com.