A little script to convert Direct-to-Consumer DNA chip data (eg. AncestryDNA/23andMe) to GRCh38 referenced VCF.
You will need to install the following software to run the conversion script:
apt update
apt install -y python3 python3-pip python3-venv bcftools tabix
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
Before running the conversion script, you need to download the reference genome data. This can be done by running the following command:
./download-references.sh
You can then convert your DNA chip data to a GRCh38 referenced VCF by running the following command:
./dna-chip-to-vcf.sh -o results/genome.vcf.gz -f AncestryDNA ./AncestryDNA.txt
VCFs generated by this script are subject to a lot of limitations, such as:
- Because the DNA is assesed at a limited number of discrete loci, the VCF will be missing many variants.
- The DTC DNA chip data has no quality control information, so there will be systematic errors in the VCF [1]. Interpret results with caution.
- C. Lu, B. Greshake Tzovaras, and J. Gough, "A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research," Computational and Structural Biotechnology Journal, vol. 19, pp. 3747–3754, 2021, doi: https://doi.org/10.1016/j.csbj.2021.06.040.