gymrek-lab/TRTools

Merge STR: More than one value found for END

bharathramh opened this issue · 1 comments

Im trying to merge around 3 files of GangSTR output into one mergefile. while doing it, Im facing an issue of

More than one value found for END

it ran for 89k lines and it stopped after this line

(bharath) []$ tail -1 test.vcf.vcf
chr10   17809113        .       CCTCCCCTCCCCTCCCCTCCCCTCC       .       .       .       END=17809163;PERIOD=5;RU=cctcc;REF=5.0;STUTTERUP=0.05;STUTTERDOWN=0.05;STUTTERP=0.9;EXPTHRESH=-1 GT:DP:Q:REPCN:REPCI:RC:ML:INS:STDERR:ENCLREADS:FLNKREADS:QEXP   0/0:49:1.0:5,5:5-5,5-5:29,20,0,0:286.292:419.097,96.3636:0.0,0.0:5,29:NULL:-1.0,-1.0,-1.0        0/0:28:0.999683:5,5:5-5,5-5:16,12,0,0:167.282:416.998,95.8552:0.0,0.0:5,16:NULL:-1.0,-1.0,-1.0   0/0:42:1.0:5,5:5-5,5-5:25,17,0,0:251.281:415.347,94.2128:0.0,0.0:5,25:NULL:-1.0,-1.0,-1.0

I couldn't able to figure out the error from the vcf files . the following lines are the next lines in each file, I have found out that multiple END values are given for the same location. but how do i resolve this issue?

(bharath) []$ zcat *.vcf.gz | grep -w "17813632"
chr10   17813632        .       TATA    .       .       .       END=17813701;EXPTHRESH=-1;GRID=1,5;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05     GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP   0/0:27:0.970717:2,2:2-2,2-2:8,19,0,0:2,8:NULL:187.217:415.347,94.2128:0,0:-1,-1,-1
chr10   17813632        .       TATA    .       .       .       END=17813703;EXPTHRESH=-1;GRID=1,5;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05     GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP   0/0:27:0.962654:2,2:2-2,2-2:8,19,0,0:2,8:NULL:187.622:415.347,94.2128:0,0:-1,-1,-1
chr10   17813632        .       TATA    .       .       .       END=17813703;EXPTHRESH=-1;GRID=1,5;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05     GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP   0/0:20:0.315071:2,2:1-3,1-3:2,18,0,0:2,2:NULL:160.457:416.998,95.8552:0.466294,0.466294:-1,-1,-1
chr10   17813632        .       TATA    TATATA  .       .       END=17813701;EXPTHRESH=-1;GRID=1,6;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05     GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP   0/1:24:0.33128:2,3:2-3,2-6:2,19,0,3:2,2:2,2|3,1:194.26:416.998,95.8552:0.500759,0.76488:-1,-1,-1
chr10   17813632        .       TATA    .       .       .       END=17813701;EXPTHRESH=-1;GRID=1,103;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05   GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP   0/0:14:0.00196308:2,2:1-23,1-23:0,14,0,0:NULL:NULL:115.062:419.097,96.3636:7.19692,7.19692:-1,-1,-1
chr10   17813632        .       TATA    TATATA  .       .       END=17813703;EXPTHRESH=-1;GRID=1,103;PERIOD=2;REF=2;RU=ta;STUTTERDOWN=0.05;STUTTERP=0.9;STUTTERUP=0.05   GT:DP:Q:REPCN:REPCI:RC:ENCLREADS:FLNKREADS:ML:INS:STDERR:QEXP   1/1:14:0.00181527:3,3:1-24,1-24:0,14,0,0:NULL:NULL:115.131:419.097,96.3636:6.13539,6.13539:-1,-1,-1

On the line merging stopped at, looking at the POS and length of the REF allele you would conclude that the coordinate of the last base pair of the REF allele is "17809137". But the given END info field is "17809163". I assume it's erroring out because those don't match. I would look to see which of the POS/REF/END fields was incorrectly set upstream.