Error with parsing alleles depth
glaudelrio opened this issue · 7 comments
Hi! I am trying to run vcf2bgc and I found this error:
$ vcf2bgc.py -v chr22_ldna.recode.vcf -m population_map.txt --p1 P1 --p2 P2 --admixed ADMIXED --outprefix clines_chr22
P1 population has 6 individuals...
P2 population has 8 individuals...
Admixed populalation has 67 individuals...
Processing 1563 records in VCF file...
Traceback (most recent call last):
File "/home/user/app/src/scripts/vcf2bgc.py", line 240, in get_allele_depth
alleles = call.data[2].split(",")
AttributeError: 'int' object has no attribute 'split'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/app/src/scripts/vcf2bgc.py", line 422, in
main()
File "/home/user/app/src/scripts/vcf2bgc.py", line 172, in main
write_output(record, popsamples, ref, alt, locus, args.outprefix, admix, p1, p2)
File "/home/user/app/src/scripts/vcf2bgc.py", line 285, in write_output
admix_output = get_allele_depth(record, "Admixed", ref, alt, sampledict)
File "/home/user/app/src/scripts/vcf2bgc.py", line 254, in get_allele_depth
raise AttributeError("Error with parsing allele depths!")
AttributeError: Error with parsing allele depths!
My vcf was generated with GATK 4. Any idea on what is going on?
Thank you so much!
Best wishes
Hi.
The error is occurring because the ipyrad and stacks VCF files have a field that has the allele depths in it, but your GATK 4 file seems to either not have the allele depths (AD) field or it might be in a different location or have a different format than ipyrad and stacks. vcf2bgc.py hasn't been tested with GATK 4 VCF files, and VCF files have a lot of variant formats.
If you could send me your VCF file, I can see if it would be possible to add support for GATK 4 VCF files. If it has the Allele Depth (AD) field in the FORMAT tag, then it should be possible to add it. See the example record below to see where you can find the AD field. You can email me your VCF file if you want and I'll see if I can add support for that format.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878 [other samples...]
20 10001019 . T G 364.77 . [CLIPPED] GT:AD:DP:GQ:PL 0/1:18,15:33:99:393,0,480
Hi,
I was not able to reproduce the error you received with the VCF file you sent me. Note that I subset the number of sites to the same number that it printed above, and I am not 100% sure that I assigned the same individuals to P1, P2, and admixed populations. But it seemed to run all the way through. Is the VCF file you ran above any different than the one you sent me?
-Bradley
Again, I apologize for my delayed reply. I think I know what the issue is. I added support for stacks recently, and I think I need to rebuild the docker image. So in other words, the vcf2bgc.py file in the ClineHelpR/scripts directory is correct, but I haven't added the updated file to the docker image.
the reason I mentioned stacks is because I think that format also works with the VCF file generated by GATK4.
Hi. I fixed the issue with the Docker version of the vcf2bgc.py script. It should work now. I'm closing the issue, but if it still doesn't work let me know.
You should run:
sudo docker pull btmartin721/clinehelpr:latest
to pull the new docker image.