TimD1/vcfdist

Contig not in truth

Closed this issue · 2 comments

When running vcfdist, if a contig has a variant in the query VCF, but not in the truth VCF, I get this error

Contig 'plasmid_2' found in query VCF but not truth VCF. Please provide BED file.

It would be nice if vcfdist could seamlessly deal with this and make any variant on the contig a FP if no BED is provided. (This is what hap.py does)

TimD1 commented

I was originally on the fence about whether this should be a warning or an error, since I generally expected users to provide a BED (making this irrelevant) and thought a common error mode would be accidentally including extra contigs in the query vs truth VCF (leading to extra FPs).

But you're right, I now think I should trust the user and permit this (since it's otherwise annoying to work around); I'll downgrade it to a warning.

I should add a disclaimer here that I work with bacterial genomes where we obviously have different norms to human genomics folk. I appreciate that for human variant calling a BED file is a necessity.

The other option would be would adding a command line flag to say assume all positions or something like that?

But I guess it depends on how interoperable you're trying to be with hap.py. If you want people to be able to seamlessly change, then I guess changing the default behaviour makes sense.