dancooke/starfish

Same variant appearing in 2 different sets.

Closed this issue · 6 comments

Hello, I am comparing 8 gVCF files.
I noticed some variants are appearing in different sets. Exactly 33 variants do appear more than in one set, on a total of 750000 variants. So that's a very few percentage. I don't see what they have in particular.

while read p; do chr="$(echo $p|awk '{print $1}')" && pos="$(echo $p|awk '{print $2}')" && for i in *.vcf.gz;do bcftools view $i|grep -v "#"|grep "$chr"|grep -w "$pos" && echo $i;done ; done <dup.txt
Chrom_2	12461656	.	T	C,<*>	25.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:25:37:23,14,0:0.378378,0:25,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	12461656	.	T	C,<*>	12.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:13:36:23,13,0:0.361111,0:12,0,40,990,990,990
BG.vcf.gz
Chrom_2	12461657	.	C	T,<*>	26.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:26:37:23,14,0:0.378378,0:26,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	12461657	.	C	T,<*>	13.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:14:37:24,13,0:0.351351,0:13,0,41,990,990,990
BG.vcf.gz
Chrom_2	14800309	.	T	A,<*>	24.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:24:39:17,22,0:0.564103,0:24,0,48,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	14800309	.	T	A,<*>	5.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:5:52:37,15,0:0.288462,0:3,0,39,990,990,990
FG.vcf.gz
Chrom_2	5899645	.	T	C,<*>	24.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:25:58:29,29,0:0.5,0:24,0,54,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	5899645	.	T	C,<*>	7.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:8:34:29,5,0:0.147059,0:7,0,33,990,990,990
G.vcf.gz
Chrom_2	9005633	.	A	C,<*>	6.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:7:38:22,16,0:0.421053,0:5,0,41,990,990,990
ABCDEGH.vcf.gz
Chrom_2	9005633	.	A	C,<*>	12.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:13:60:42,18,0:0.3,0:12,0,37,990,990,990
G.vcf.gz
Chrom_2	13923379	.	G	A,<*>	14.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:15:25:15,10,0:0.4,0:14,0,44,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	13923379	.	G	A,<*>	10.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:10:38:26,12,0:0.315789,0:9,0,37,990,990,990
G.vcf.gz
Chrom_2	13923381	.	A	C,<*>	14.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:14:31:21,10,0:0.322581,0:14,0,41,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	13923381	.	A	C,<*>	6.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:6:49:37,12,0:0.244898,0:5,0,35,990,990,990
G.vcf.gz
Chrom_3	5537602	.	G	A,<*>	47.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:48:65:33,32,0:0.492308,0:47,0,150,990,990,990
ACDEFGH.vcf.gz
Chrom_3	5537602	.	G	A,<*>	9.7	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:10:39:24,15,0:0.384615,0:9,0,42,990,990,990
G.vcf.gz
Chrom_3	5537605	.	A	G,<*>	46.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:46:67:35,32,0:0.477612,0:46,0,71,990,990,990
ACDEFGH.vcf.gz
Chrom_3	5537605	.	A	G,<*>	15.7	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:16:51:36,15,0:0.294118,0:15,0,43,990,990,990
G.vcf.gz
Chrom_4	2859654	.	C	T,<*>	46.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:46:42:18,24,0:0.571429,0:46,0,57,990,990,990
ABCDEFGH.vcf.gz
Chrom_4	2859654	.	C	T,<*>	11.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:11:44:38,6,0:0.136364,0:10,0,39,990,990,990
G.vcf.gz
Chrom_4	2859656	.	G	T,<*>	37.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:38:43:18,24,0:0.55814,0:37,0,52,990,990,990
ABCDEFGH.vcf.gz
Chrom_4	2859656	.	G	T,<*>	9.5	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:9:39:31,6,0:0.153846,0:8,0,37,990,990,990
G.vcf.gz
Chrom_4	5073318	.	A	T,<*>	14.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4	5073318	.	A	T,<*>	4.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_2	6000828	.	G	A,<*>	57.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:57:67:33,33,0:0.492537,0:57,0,74,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	6000828	.	G	A,<*>	4.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:50:39,11,0:0.22,0:1,0,36,990,990,990
D.vcf.gz
Chrom_2	6000836	.	C	G,<*>	52.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:52:63:31,32,0:0.507937,0:52,0,150,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	6000836	.	C	G,<*>	5.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:6:37:25,12,0:0.324324,0:4,0,39,990,990,990
D.vcf.gz
Chrom_4	5073318	.	A	T,<*>	14.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4	5073318	.	A	T,<*>	4.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_2	7708865	.	T	A,<*>	33	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:33:51:31,20,0:0.392157,0:33,0,62,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	7708865	.	T	A,<*>	6.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:6:42:31,11,0:0.261905,0:5,0,35,990,990,990
BF.vcf.gz
Chrom_2	7708867	.	A	C,<*>	28.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:28:48:30,18,0:0.375,0:28,0,57,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	7708867	.	A	C,<*>	9.5	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:10:41:30,11,0:0.268293,0:9,0,37,990,990,990
BF.vcf.gz
Chrom_2	14800309	.	T	A,<*>	24.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:24:39:17,22,0:0.564103,0:24,0,48,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	14800309	.	T	A,<*>	5.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:5:52:37,15,0:0.288462,0:3,0,39,990,990,990
FG.vcf.gz
Chrom_4	3627373	.	A	G,<*>	13.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:13:44:27,17,0:0.386364,0:12,0,48,990,990,990
ABCDEFGH.vcf.gz
Chrom_4	3627373	.	A	G,<*>	4.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:5:51:37,14,0:0.27451,0:3,0,40,990,990,990
F.vcf.gz
Chrom_5	8054576	.	C	A,<*>	24.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:24:46:27,19,0:0.413043,0:24,0,50,990,990,990
ABCDEFGH.vcf.gz
Chrom_5	8054576	.	C	A,<*>	5.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:6:39:25,14,0:0.358974,0:4,0,41,990,990,990
F.vcf.gz
Chrom_5	8054580	.	A	C,<*>	17.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:18:44:27,17,0:0.386364,0:17,0,45,990,990,990
ABCDEFGH.vcf.gz
Chrom_5	8054580	.	A	C,<*>	3.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:3:46:35,11,0:0.23913,0:0,0,32,990,990,990
F.vcf.gz
Chrom_4	5073318	.	A	T,<*>	14.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4	5073318	.	A	T,<*>	4.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_3	7892042	.	G	C,<*>	35	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:35:59:21,38,0:0.644068,0:34,0,54,990,990,990
ABCDEFGH.vcf.gz
Chrom_3	7892042	.	G	C,<*>	3.5	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:3:44:33,11,0:0.25,0:0,0,30,990,990,990
E.vcf.gz
Chrom_3	7892043	.	C	A,<*>	38.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:38:59:21,38,0:0.644068,0:38,0,55,990,990,990
ABCDEFGH.vcf.gz
Chrom_3	7892043	.	C	A,<*>	4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:63:52,11,0:0.174603,0:1,0,30,990,990,990
E.vcf.gz
Chrom_4	13530156	.	T	A,<*>	24.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:25:57:29,28,0:0.491228,0:24,0,60,990,990,990
ABCDEFGH.vcf.gz
Chrom_4	13530156	.	T	A,<*>	5	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:5:62:46,16,0:0.258065,0:3,0,36,990,990,990
E.vcf.gz
Chrom_4	13530161	.	A	T,<*>	16.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:16:57:30,27,0:0.473684,0:16,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_4	13530161	.	A	T,<*>	4.8	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:5:73:59,14,0:0.191781,0:2,0,35,990,990,990
E.vcf.gz
Chrom_2	7708865	.	T	A,<*>	33	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:33:51:31,20,0:0.392157,0:33,0,62,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	7708865	.	T	A,<*>	6.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:6:42:31,11,0:0.261905,0:5,0,35,990,990,990
BF.vcf.gz
Chrom_2	7708867	.	A	C,<*>	28.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:28:48:30,18,0:0.375,0:28,0,57,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	7708867	.	A	C,<*>	9.5	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:10:41:30,11,0:0.268293,0:9,0,37,990,990,990
BF.vcf.gz
Chrom_2	12461656	.	T	C,<*>	25.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:25:37:23,14,0:0.378378,0:25,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	12461656	.	T	C,<*>	12.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:13:36:23,13,0:0.361111,0:12,0,40,990,990,990
BG.vcf.gz
Chrom_2	12461657	.	C	T,<*>	26.3	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:26:37:23,14,0:0.378378,0:26,0,53,990,990,990
ABCDEFGH.vcf.gz
Chrom_2	12461657	.	C	T,<*>	13.9	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:14:37:24,13,0:0.351351,0:13,0,41,990,990,990
BG.vcf.gz
Chrom_4	5073318	.	A	T,<*>	14.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:14:39:22,16,0:0.410256,0:14,0,35,990,990,990
ACDFH.vcf.gz
Chrom_4	5073318	.	A	T,<*>	4.4	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:41:33,8,0:0.195122,0:2,0,31,990,990,990
BCDEFH.vcf.gz
Chrom_6	8358346	.	T	C,<*>	25.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:25:46:19,27,0:0.586957,0:25,0,44,990,990,990
ABCDEGH.vcf.gz
Chrom_6	8358346	.	T	C,<*>	3.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:44:29,13,0:0.295455,0:1,0,37,990,990,990
FH.vcf.gz
Chrom_6	4595590	.	C	T,<*>	37.6	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:38:59:33,26,0:0.440678,0:37,0,58,990,990,990
ABCDEFGH.vcf.gz
Chrom_6	4595590	.	C	T,<*>	3.7	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:4:57:46,11,0:0.192982,0:1,0,29,990,990,990
H.vcf.gz

thank you

Are you able to provide the input VCF's that generate these results? Even a small region that emits one of the duplicates would be helpful.

No worries
I am sending you a vcf with the chrom 2

Thanks, would you also be able to provide the SDF or FASTA reference that you used?

The problem is that RTG Tools vcfeval emits a true positive and a false positive for these records when comparing two of the VCFs. I'm not 100% sure that vcfeval supports 'gVCF' to be honest... I can ask on the rtg-users forum.

Mmmm interesting I hadn't thought about that possibility, of RTG not being able to handle gVCF.
I opened a thread on the rtg-users forum to ask.

Closing following discussion on rtg-users forum. No issue with vcfeval or Starfish.