locations with multiple ALTs produce duplicates in filtered outputs
Closed this issue · 0 comments
delocalizer commented
tail -n+2 chr22_KI270734v1_random_001_9_12_10.tsv|awk -F"\t" '{print $15"\t"$16"\t"$2"\t"$3"\t"$4}' |sort |uniq -c|sort -n|tail
1 chr22_KI270734v1_random 122129 T C T
1 chr22_KI270734v1_random 122134 T C T
1 chr22_KI270734v1_random 122208 T C G
1 chr22_KI270734v1_random 122233 G C G
1 chr22_KI270734v1_random 122241 A G A
64 chr22_KI270734v1_random 120663 G A G
64 chr22_KI270734v1_random 120663 G C G
64 chr22_KI270734v1_random 121454 T AA T
64 chr22_KI270734v1_random 121692 G TT G
729 chr22_KI270734v1_random 121191 T AA T
The last few positions have dupes - 64 in the first 4 cases, and 729 in the last. Since 64=2^6 and 729=3^6, I'm guessing this is soemthing like a combinatorial problem - e.g. two records getting expanded to 64, and three to 729.
These duplicates are not present in the unfiltered variants JSON.