Same variant appearing in 2 different sets.
Closed this issue · 6 comments
Hello, I am comparing 8 gVCF files.
I noticed some variants are appearing in different sets. Exactly 33 variants do appear more than in one set, on a total of 750000 variants. So that's a very few percentage. I don't see what they have in particular.
while read p; do chr="$(echo $p|awk '{print $1}')" && pos="$(echo $p|awk '{print $2}')" && for i in *.vcf.gz;do bcftools view $i|grep -v "#"|grep "$chr"|grep -w "$pos" && echo $i;done ; done <dup.txt
Chrom_2 12461656 . T C,<*> 25.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:25:37:23,14,0:0.378378,0:25,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 12461656 . T C,<*> 12.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:13:36:23,13,0:0.361111,0:12,0,40,990,990,990
BG.vcf.gz
Chrom_2 12461657 . C T,<*> 26.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:26:37:23,14,0:0.378378,0:26,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 12461657 . C T,<*> 13.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:14:37:24,13,0:0.351351,0:13,0,41,990,990,990
BG.vcf.gz
Chrom_2 14800309 . T A,<*> 24.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:24:39:17,22,0:0.564103,0:24,0,48,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 14800309 . T A,<*> 5.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:5:52:37,15,0:0.288462,0:3,0,39,990,990,990
FG.vcf.gz
Chrom_2 5899645 . T C,<*> 24.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:25:58:29,29,0:0.5,0:24,0,54,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 5899645 . T C,<*> 7.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:8:34:29,5,0:0.147059,0:7,0,33,990,990,990
G.vcf.gz
Chrom_2 9005633 . A C,<*> 6.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:7:38:22,16,0:0.421053,0:5,0,41,990,990,990
ABCDEGH.vcf.gz
Chrom_2 9005633 . A C,<*> 12.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:13:60:42,18,0:0.3,0:12,0,37,990,990,990
G.vcf.gz
Chrom_2 13923379 . G A,<*> 14.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:15:25:15,10,0:0.4,0:14,0,44,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 13923379 . G A,<*> 10.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:10:38:26,12,0:0.315789,0:9,0,37,990,990,990
G.vcf.gz
Chrom_2 13923381 . A C,<*> 14.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:14:31:21,10,0:0.322581,0:14,0,41,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 13923381 . A C,<*> 6.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:6:49:37,12,0:0.244898,0:5,0,35,990,990,990
G.vcf.gz
Chrom_3 5537602 . G A,<*> 47.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:48:65:33,32,0:0.492308,0:47,0,150,990,990,990
ACDEFGH.vcf.gz
Chrom_3 5537602 . G A,<*> 9.7 PASS . GT:GQ:DP:AD:VAF:PL 0/1:10:39:24,15,0:0.384615,0:9,0,42,990,990,990
G.vcf.gz
Chrom_3 5537605 . A G,<*> 46.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:46:67:35,32,0:0.477612,0:46,0,71,990,990,990
ACDEFGH.vcf.gz
Chrom_3 5537605 . A G,<*> 15.7 PASS . GT:GQ:DP:AD:VAF:PL 0/1:16:51:36,15,0:0.294118,0:15,0,43,990,990,990
G.vcf.gz
Chrom_4 2859654 . C T,<*> 46.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:46:42:18,24,0:0.571429,0:46,0,57,990,990,990
ABCDEFGH.vcf.gz
Chrom_4 2859654 . C T,<*> 11.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:11:44:38,6,0:0.136364,0:10,0,39,990,990,990
G.vcf.gz
Chrom_4 2859656 . G T,<*> 37.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:38:43:18,24,0:0.55814,0:37,0,52,990,990,990
ABCDEFGH.vcf.gz
Chrom_4 2859656 . G T,<*> 9.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:9:39:31,6,0:0.153846,0:8,0,37,990,990,990
G.vcf.gz
Chrom_4 5073318 . A T,<*> 14.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4 5073318 . A T,<*> 4.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_2 6000828 . G A,<*> 57.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:57:67:33,33,0:0.492537,0:57,0,74,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 6000828 . G A,<*> 4.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:50:39,11,0:0.22,0:1,0,36,990,990,990
D.vcf.gz
Chrom_2 6000836 . C G,<*> 52.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:52:63:31,32,0:0.507937,0:52,0,150,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 6000836 . C G,<*> 5.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:6:37:25,12,0:0.324324,0:4,0,39,990,990,990
D.vcf.gz
Chrom_4 5073318 . A T,<*> 14.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4 5073318 . A T,<*> 4.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_2 7708865 . T A,<*> 33 PASS . GT:GQ:DP:AD:VAF:PL 0/1:33:51:31,20,0:0.392157,0:33,0,62,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 7708865 . T A,<*> 6.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:6:42:31,11,0:0.261905,0:5,0,35,990,990,990
BF.vcf.gz
Chrom_2 7708867 . A C,<*> 28.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:28:48:30,18,0:0.375,0:28,0,57,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 7708867 . A C,<*> 9.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:10:41:30,11,0:0.268293,0:9,0,37,990,990,990
BF.vcf.gz
Chrom_2 14800309 . T A,<*> 24.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:24:39:17,22,0:0.564103,0:24,0,48,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 14800309 . T A,<*> 5.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:5:52:37,15,0:0.288462,0:3,0,39,990,990,990
FG.vcf.gz
Chrom_4 3627373 . A G,<*> 13.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:13:44:27,17,0:0.386364,0:12,0,48,990,990,990
ABCDEFGH.vcf.gz
Chrom_4 3627373 . A G,<*> 4.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:5:51:37,14,0:0.27451,0:3,0,40,990,990,990
F.vcf.gz
Chrom_5 8054576 . C A,<*> 24.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:24:46:27,19,0:0.413043,0:24,0,50,990,990,990
ABCDEFGH.vcf.gz
Chrom_5 8054576 . C A,<*> 5.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:6:39:25,14,0:0.358974,0:4,0,41,990,990,990
F.vcf.gz
Chrom_5 8054580 . A C,<*> 17.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:18:44:27,17,0:0.386364,0:17,0,45,990,990,990
ABCDEFGH.vcf.gz
Chrom_5 8054580 . A C,<*> 3.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:3:46:35,11,0:0.23913,0:0,0,32,990,990,990
F.vcf.gz
Chrom_4 5073318 . A T,<*> 14.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4 5073318 . A T,<*> 4.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_3 7892042 . G C,<*> 35 PASS . GT:GQ:DP:AD:VAF:PL 0/1:35:59:21,38,0:0.644068,0:34,0,54,990,990,990
ABCDEFGH.vcf.gz
Chrom_3 7892042 . G C,<*> 3.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:3:44:33,11,0:0.25,0:0,0,30,990,990,990
E.vcf.gz
Chrom_3 7892043 . C A,<*> 38.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:38:59:21,38,0:0.644068,0:38,0,55,990,990,990
ABCDEFGH.vcf.gz
Chrom_3 7892043 . C A,<*> 4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:63:52,11,0:0.174603,0:1,0,30,990,990,990
E.vcf.gz
Chrom_4 13530156 . T A,<*> 24.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:25:57:29,28,0:0.491228,0:24,0,60,990,990,990
ABCDEFGH.vcf.gz
Chrom_4 13530156 . T A,<*> 5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:5:62:46,16,0:0.258065,0:3,0,36,990,990,990
E.vcf.gz
Chrom_4 13530161 . A T,<*> 16.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:16:57:30,27,0:0.473684,0:16,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_4 13530161 . A T,<*> 4.8 PASS . GT:GQ:DP:AD:VAF:PL 0/1:5:73:59,14,0:0.191781,0:2,0,35,990,990,990
E.vcf.gz
Chrom_2 7708865 . T A,<*> 33 PASS . GT:GQ:DP:AD:VAF:PL 0/1:33:51:31,20,0:0.392157,0:33,0,62,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 7708865 . T A,<*> 6.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:6:42:31,11,0:0.261905,0:5,0,35,990,990,990
BF.vcf.gz
Chrom_2 7708867 . A C,<*> 28.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:28:48:30,18,0:0.375,0:28,0,57,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 7708867 . A C,<*> 9.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:10:41:30,11,0:0.268293,0:9,0,37,990,990,990
BF.vcf.gz
Chrom_2 12461656 . T C,<*> 25.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:25:37:23,14,0:0.378378,0:25,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 12461656 . T C,<*> 12.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:13:36:23,13,0:0.361111,0:12,0,40,990,990,990
BG.vcf.gz
Chrom_2 12461657 . C T,<*> 26.3 PASS . GT:GQ:DP:AD:VAF:PL 0/1:26:37:23,14,0:0.378378,0:26,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2 12461657 . C T,<*> 13.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:14:37:24,13,0:0.351351,0:13,0,41,990,990,990
BG.vcf.gz
Chrom_4 5073318 . A T,<*> 14.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4 5073318 . A T,<*> 4.4 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_6 8358346 . T C,<*> 25.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:25:46:19,27,0:0.586957,0:25,0,44,990,990,990
ABCDEGH.vcf.gz
Chrom_6 8358346 . T C,<*> 3.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:44:29,13,0:0.295455,0:1,0,37,990,990,990
FH.vcf.gz
Chrom_6 4595590 . C T,<*> 37.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:38:59:33,26,0:0.440678,0:37,0,58,990,990,990
ABCDEFGH.vcf.gz
Chrom_6 4595590 . C T,<*> 3.7 PASS . GT:GQ:DP:AD:VAF:PL 0/1:4:57:46,11,0:0.192982,0:1,0,29,990,990,990
H.vcf.gz
thank you
Are you able to provide the input VCF's that generate these results? Even a small region that emits one of the duplicates would be helpful.
No worries
I am sending you a vcf with the chrom 2
Thanks, would you also be able to provide the SDF or FASTA reference that you used?
The problem is that RTG Tools vcfeval emits a true positive and a false positive for these records when comparing two of the VCFs. I'm not 100% sure that vcfeval supports 'gVCF' to be honest... I can ask on the rtg-users forum.
Mmmm interesting I hadn't thought about that possibility, of RTG not being able to handle gVCF.
I opened a thread on the rtg-users forum to ask.
Closing following discussion on rtg-users forum. No issue with vcfeval or Starfish.