/t2t-chm13-chry

Primary LanguageJupyter Notebook

T2T-CHM13v2.0 short-read alignment and variant calling

Dylan Taylor, Stephen Hwang, Samantha Zarate, Rajiv McCoy, Michael Schatz

Code associated with the manuscript "The complete sequence of a human Y chromosome". This repository contains the workflows used to perform short-read alignment and variant calling in 3202 samples from the 1000 Genomes Project (1KGP) and 279 open-access samples from the Simons Genome Diversity Project (SGDP), as well as code pertaining to downstream analysis of these alignments/variant calls. Code is organized into the following directories:

  1. alignment_variant_calling_pipeline - The workflows (as WDLs) used to perform short-read alignment and variant calling, along with descriptions for each step of the pipeline. The data itself is available for download on the public AnVIL workspace associated with this project.
  2. 1KGP_alignment_variant_calling_analysis - A jupyter notebook used for downstream analysis of improvements in short-read alignment and variant calling for the 1KGP samples using the T2T-CHM13v2.0 reference.
  3. SGDP_alignment_variant_calling_analysis - A jupyter notebook used for downstream analysis of improvements in short-read alignment and variant calling for the SGDP samples using the T2T-CHM13v2.0 reference.