Dylan Taylor, Stephen Hwang, Samantha Zarate, Rajiv McCoy, Michael Schatz
Code associated with the manuscript "The complete sequence of a human Y chromosome". This repository contains the workflows used to perform short-read alignment and variant calling in 3202 samples from the 1000 Genomes Project (1KGP) and 279 open-access samples from the Simons Genome Diversity Project (SGDP), as well as code pertaining to downstream analysis of these alignments/variant calls. Code is organized into the following directories:
alignment_variant_calling_pipeline
- The workflows (as WDLs) used to perform short-read alignment and variant calling, along with descriptions for each step of the pipeline. The data itself is available for download on the public AnVIL workspace associated with this project.1KGP_alignment_variant_calling_analysis
- A jupyter notebook used for downstream analysis of improvements in short-read alignment and variant calling for the 1KGP samples using the T2T-CHM13v2.0 reference.SGDP_alignment_variant_calling_analysis
- A jupyter notebook used for downstream analysis of improvements in short-read alignment and variant calling for the SGDP samples using the T2T-CHM13v2.0 reference.