"the provided VCF appears to contain no invariant sites"
MafaldaSFerreira opened this issue · 2 comments
I am running pixy with per chromosome vcf files containing variant and invariant sites. However, for some of the files, pixy reports that there are no invariant sites and fails with the following error:
Checking for invariant sites...Exception: [pixy] ERROR: the provided VCF appears to contain no invariant sites (ALT = "."). This check can be bypassed via --bypass_invariant_check 'yes'.
I am running pixy with the following command (this is an example for chromosome 10):
pixy --stats pi fst dxy --populations population_files/clusters_v01.txt --vcf chromosomes_maf0.01/herring_sentieon_125ind_230722_filter_setGT_miss0.2.vcf.minDP4.0maxDP3.0avg_miss0.2.maf0.01.ALLSites.chr10.vcf.gz --window_size 20000 --n_cores 2 --output_folder results/clusters_v01_maf0.01_20231011 --output_prefix clusters_v01.chr10.maf0.01.20kb.popgenpixy.out
I do have invariant sites in the vcf files. Here is a small subset of such sites for chromosome 10:
bcftools view --max-ac 2:nref herring_sentieon_125ind_230722_filter_setGT_miss0.2.vcf.minDP4.0maxDP3.0avg_miss0.2.maf0.01.ALLSites.chr10.vcf.gz | bcftools query -f '%CHROM\t%REF\t%ALT\t[%GT\t ]\n' | head -n1000 > chr10.maf0.01.1000.invariant.txt
And if I run bcftools stats on the vcf of chromosome 10, it says the total number of no-Alts is 17184629, whereas the number of SNPs is 492509 (also attaching the output of bcftools stats).
The only thing I could think of is that in this particular vcf the variant sites start before the invariant sites... Would that trigger the warning?
chr10.maf0.01.1000.invariant.txt
herring_sentieon_125ind_230722_filter_setGT_miss0.2.vcf.minDP4.0maxDP3.0avg_miss0.2.maf0.01.ALLSites.chr10.stats.txt
Hi There!
The check for invariant sites is fairly conservative, so if you are 100% sure that you have invariant sites correctly represented in your vcf, you can bypass the check by including --bypass_invariant_check 'yes'
.