T2T Diversity Panel

This dataset includes sequencing data, assemblies, and analyses for the offspring of ten parent-offspring trios.

Data will be added and updated as technologies improve or new data uses are encountered. If you have issues/questions open an issue on this github page.

Data description

Each parent in the trio was sequenced with Illumina short reads, each child was sequenced with Illumina short reads, 10X Genomics, Nanopore, PacBio CLR and HiFi, Bionano and Hi-C.

For nanopore datasets, each folder contains the fast5, fastq (basecalled with Guppy 2.3.5 flip flop with the high accuracy model), and a sequencing summary file.

For PacBio CLR data, each folder contains a subread bam file which can be converted to fasta/q using either bam2fastq or samtools fasta. The HiFi folders contain ccs.bam files which have already been converted from subreads into high-fidelity reads. As before, they can be converted to fasta/q using bam2fastq or samtools fasta.

For Bionano data, each folder contains both the assembled optical map (cmap) and the individual molecules (bnx.gz)

For the remaining short-read data, each folder contains one or more subfolders with fastq.gz files.

Data download

Data is hosted on AWS with the links below leading to each individual data type. You can download each file directly through the browser or, alternatively, using Amazon's AWS cli.

For example, to download all PacBio CLR data for HG01109, located at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=NHGRI_UCSC_panel/HG01109/PacBio_CLR/ we can remove index.html?prefix= from the url and replace https://s3-us-west-2.amazonaws.com/ with s3://, giving a URL of s3://human-pangenomics/NHGRI_UCSC_panel/HG01109/PacBio_CLR/. We can then download all files in this subfolder with the command:

aws --no-sign-request s3 sync s3://human-pangenomics/NHGRI_UCSC_panel/HG01109/PacBio_CLR/ ./

To instead download all data for this sample run (NOTE: HiFi and Illumina data will not be downloaded with this command and must be downloaded separately):

aws --no-sign-request s3 sync s3://human-pangenomics/NHGRI_UCSC_panel/HG01109/ ./

There are many other s3 commands, such as ls and cp to list folder contents or to download individual files. Check the AWS documentation for more information. Amending the max_concurrent_requests etc. settings as per this guide will improve download performance further.

Below are the links to download each datatype:

HG01109 (Male, PUR)
- Father HG01107 Illumina data
- Mother HG01108 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG01243 (Male, PUR)
- Father HG01241 Illumina data
- Mother HG01242 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG02080 (Female, KHV)
- Father HG02082 Illumina data
- Mother HG02081 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG03098 (Male, MSL)
- Father HG03096 Illumina data
- Mother HG03097 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG02055 (Male, ACB)
- Father HG02053 Illumina data
- Mother HG02054 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG03492 (Male, PJL)
- Father HG03490 Illumina data
- Mother HG03491 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG02723 (Female, GWD)
- Father HG02721 Illumina data
- Mother HG02722 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG02109 (Female, ACB)
- Father HG02107 Illumina data
- Mother HG02108 Illumina data
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly
HG01442 (Male, CLM)
- Uncle HG01440 Illumina data
- Mother HG01441 Illumina data
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
HG02145 (Male, ACB)
- Father HG02143 Illumina data
- Mother HG02144 Illumina data
- Illumina left and right pairs
- 10XG data
- PacBio CLR data
- PacBio HiFi data
- Nanopore data
- Bionano data
- Hi-C data
- Hifiasm trio assembly

AliCihan/hpgp-data

T2T Diversity Panel

Data description

Data download