/dna-chip-to-vcf

A little script to convert Direct-to-Consumer DNA chip data to GRCh38 referenced VCF.

Primary LanguageShellMozilla Public License 2.0MPL-2.0

DNA Chip-To-VCF

A little script to convert Direct-to-Consumer DNA chip data (eg. AncestryDNA/23andMe) to GRCh38 referenced VCF.

Prerequisites

You will need to install the following software to run the conversion script:

BCFTools

apt update
apt install -y python3 python3-pip python3-venv bcftools tabix

CrossMap

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

Usage

Download Reference Data

Before running the conversion script, you need to download the reference genome data. This can be done by running the following command:

./download-references.sh

Convert to VCF

You can then convert your DNA chip data to a GRCh38 referenced VCF by running the following command:

./dna-chip-to-vcf.sh -o results/genome.vcf.gz -f AncestryDNA ./AncestryDNA.txt

Caveats

VCFs generated by this script are subject to a lot of limitations, such as:

  • Because the DNA is assesed at a limited number of discrete loci, the VCF will be missing many variants.
  • The DTC DNA chip data has no quality control information, so there will be systematic errors in the VCF [1]. Interpret results with caution.

References

  1. C. Lu, B. Greshake Tzovaras, and J. Gough, "A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research," Computational and Structural Biotechnology Journal, vol. 19, pp. 3747–3754, 2021, doi: https://doi.org/10.1016/j.csbj.2021.06.040. ‌