/azure-synapse-vcf-analysis

Sample code for analyzing VCF files (converted to Parquet) in Azure Databricks and Synapse.

GNU General Public License v3.0GPL-3.0

VCF Analysis in Azure Synapse

Sample code for analyzing VCF files in Azure Synapse (once converted to Parquet using Glow).

Colby T. Ford, Ph.D.

Pipeline

Sample Code

  1. Convert VCF files to Parquet: ConvertVCFsToParquet.md
  2. Create External Table to VCF-based Parquet Files in Azure Synapse: CreateVCFTable.md
  3. Sample SQL Queries: SampleQueries.md

Sample Data

The sample VCF data used in this demo is from the Phase 3 release of the 1000 Genomes Project. This includes ~168GB of data in VCFs, which can be downloaded from their FTP site.

BlueGranite Resources