Enable bq_to_vcf to drop irrelevant variants when `--sample_names` is set
Opened this issue · 1 comments
samanvp commented
Imagine --sample_names S1 S2
and the following BQ table is given as source table:
v1 [S1, S2, S3, S4]
v2 [S1, S3, S4]
v3 [S2, S3, S4]
v4 [S3, S4]
currently output VCF file will include all 4 variants:
S1 S2
v1 x x
v2 x 0/0
v3 0/0 x
v4 0/0 0/0
where x
indicates the value we read from BQ table. Including v4
in the output VCF file while none of the samples of interest have that variant does not make much sense.
samanvp commented
@tneymanov to follow up on our conversation:
If user runs bq_to_vcf, for the previous example, using --sample_names S1 S2 S5
our current output VCF file (without this issue fixed) is:
S1 S2 S5
v1 x x 0/0
v2 x 0/0 0/0
v3 0/0 x 0/0
v4 0/0 0/0 0/0
And if we fix this issue, the output will not be empty, instead, it will be:
S1 S2 S5
v1 x x 0/0
v2 x 0/0 0/0
v3 0/0 x 0/0
which is still desirable output.