GATB/MindTheGap

result issue

Closed this issue · 1 comments

I have a problem with the result of MindTheGap.
I simulated 1000 variants in chr15.fa including 524 insertions and 476 deletions with SURVIVOR and ART. I got the result with MindTheGap find and fill mode, just like the README shown.
MindTheGap find -in pair-end1.fq,pair-end2.fq -ref ../chr15/chr15.fa -out mindthegap MindTheGap fill -graph mindthegap.h5 -bkpt mindthegap.breakpoints -out mind-result
Finally, I got 507 insertions in mind-result.insertion.vcf. The breakpoints shown in vcf file is very diffenent from the simulated data. Does the points in vcf file correspond to the simulated insertion breakpoints?
Did I miss something or make something wrong?
Hope you reply ASAP and I'm grateful if you give me some clues.

Hi,
Yes, the positions indicated in the vcf file are the positions in the chromosome of the insertion sites. It can be slightly different from the exact simulation points, because insertions are left-normalized in the vcf (if repeats make the breakpoint "fuzzy" and several consecutive points are possible, MindTheGap reports the left-most position).
How different are the reported positions ?
Are the reported insertion sequences correct, or at least of the exepected size ?
You can also check that the de bruijn graph built with the default parameters represents well your sequenced genome : in the std output of find or fill, how many solid kmers do you get ? If the minimal abundance parameter is properly estimated, you should have roughly the same amount as the size of your sequenced genome.

I hope it helps,
Claire