We have sequenced the CHM13hTERT human cell line on the Oxford Nanopore GridION. We have also sequenced approximately 50x coverage using 10X Genomics as well as BioNano DLS and Arima Genomics HiC. PacBio data for this cell line has been previously generated by the Washington University School of Medicine and the University of Washington, and is available from NCBI SRA.
Human genomic DNA was extracted from the cultured cell line. As the DNA is native, modified bases will be preserved. We followed Josh Quick's ultra-long read (UL) protocol for library preparation and sequencing.
All data is released to the public domain (CC0) and we encourage its reuse. While not required, we would appreciate if you would acknowledge the "telomere-to-telomere" (T2T) consortium for the creation of this data and encourage you to join us if you would like to help finish the human reference genome. More information about our consortium can be found on the T2T homepage.
The current assembly draft (v0.4) is generated with Canu v1.7.1 including rel1 data up to 2018/11/15 and incorporating the previously released PacBio data. Two gaps on the X plus the centromere were manually resolved. The assembly was polished with two rounds of nanopolish and two rounds of arrow. The estimated base accuracy is currently QV36, which we expect to improve with future integration of the 10X Genomics data. BioNano structural variants on the X were identified, locally mapping nanopore reads selected, reassembled, and used to patch the assembly. However, these patches are not yet polished or validated using BioNano. The assembly has not been curated outside of the X chromosome.
The assembly is 2.94 Gbp in size with 657 contigs and an NG50 of 85.8 Mbp
This should be considered a draft and likely has mis-assemblies, inaccurate consensus, and frame-shifted genes. It will be further validated, scaffolded with BioNano, and polished using the available data.
- Assembly draft v0.4 (md5: 7e3c2fff9479ba45f7916fa1eee1310b)
We sequenced approximately 100 flowcells of UL data for a total of 155 Gbp (50x coverage, 1.6 Gbp/flowcell). The read N50 is 70 kbp and there are 99 Gbp of data in reads >50 kbp (32x). The longest mapping read is 1.04 Mbp.
rel2 is the same data as rel1 but recalled with the latest generation callers (Guppy flip-flop 2.3.1). We have provided mappings both to our current draft assembly and to the GRCh38 with decoys in cram format, using minimap2.
- Guppy flip-flop 2.3.1 (md5: 7e3f4ded02d500a3db0c76c84cdc42b9)
- Guppy flip-flop mapped to asm v0.4 with minimap2 (md5: 09d87ae044d1628056cb95690dc93378)
- Guppy flip-flop mapped to GRCh38 with decoys with minimap2 (md5: 1a4888cafbc935a21c17f449b4802438)
The full dataset as of 2019/01/09. These basecalls were generated on-instrument and use older versions of Guppy (depending on when the flowcell ran on the instrument).
- Guppy on-instrument (md5: c2cb74601eb657df21b7d25980908288)
The raw fast5 data, without basecalls, is available for completeness. The data is grouped into 96 sets.
- Partition 001 (md5: c837460c50a4446fc8320c95dc88f204)
- Partition 002 (md5: 05ceccf4256d248aaec2a4c61e58c26c)
- Partition 003 (md5: 879e3a6391e5da5f943fa46b92decd47)
- Partition 004 (md5: 600bfa46c741eeff0064b1d8040b9349)
- Partition 005 (md5: 1a72beff4b2e4556c5033176ed1cd109)
- Partition 006 (md5: fcd6f8ceeac2034eddaa33cedf6d0010)
- Partition 007 (md5: 0d44cb41a4888b55bce2cba7e70107ba)
- Partition 008 (md5: 52242770505ac9aca1070e0b926c4769)
- Partition 009 (md5: 4e85e63a4ebf8efb2f97fdcee46e5737)
- Partition 010 (md5: e495530dd8a68b7bc9864ab89a4ef52f)
- Partition 011 (md5: 3b57e6256d0162d83a281e74157134e0)
- Partition 012 (md5: 735a0a03c6bec1e0ed417baa0c2d7db2)
- Partition 013 (md5: 90c51a9ab06266b2a980bcc16d3d3960)
- Partition 014 (md5: 645ea0b4edc2bfc71c708a53d5b0d92b)
- Partition 015 (md5: 24f456adb4c1c6579fe34f07c82179e7)
- Partition 016 (md5: 6b72ddda5a7a1c10b50f3026914519ec)
- Partition 017 (md5: 14e7b918b28ecc784b68569454fa27d9)
- Partition 018 (md5: d5f7c9b1d88cf48298f6cbbb2a2a45a9)
- Partition 019 (md5: cefa121a627dfcf9a1dfb117065a7264)
- Partition 020 (md5: ca0729b28cd4cccc81eba670c6e86689)
- Partition 021 (md5: 51a873a2019f2b091ab035cc3f074bb8)
- Partition 022 (md5: e9235f052d651b4ba1fdaaa06ad134d0)
- Partition 023 (md5: 75ebfdb40745d667962a19a0aa838837)
- Partition 024 (md5: e1e05425f9823e50650bd2cf1efa41c6)
- Partition 025 (md5: f8efb23a5e77b12f46bce73b2ddba36a)
- Partition 026 (md5: 829f32786514b092da9e4fb8701da037)
- Partition 027 (md5: 15ebb086d975583386c1d0e49fbca932)
- Partition 028 (md5: 3dd39dee6efea9b1b50d282d1d2aae19)
- Partition 029 (md5: 3c5b3522dd741214554f84d8645cdf20)
- Partition 030 (md5: 1ef7fe24c315085d8dcfe4e6ba9b4de2)
- Partition 031 (md5: e9501d4d0fd38d64c2ad1c81f8d1a0e3)
- Partition 032 (md5: 1f3ff51da0e87c2009bef8256b930f0b)
- Partition 033 (md5: 76a518084b021db82fd5dab7540e88bb)
- Partition 034 (md5: fd9f4dcfaeb89134a4f700a5346c16fa)
- Partition 035 (md5: dbdd53ba61d67a7f61405ae39d2b931b)
- Partition 036 (md5: c243b8f64bde0051fe104e8baaecf09b)
- Partition 037 (md5: aafa1d558881b2b4856fde3af0cbb9b2)
- Partition 038 (md5: d2e39e42eaf6a0a63d0542435590dd88)
- Partition 039 (md5: ef48d5c46f19de02fb6f6646726c95de)
- Partition 040 (md5: 17d7d34b45e14b2a79fc30e5c5084315)
- Partition 041 (md5: eb6a16d0b37d538bdbf90c3bfcc0f098)
- Partition 042 (md5: 7dbf87d75c901463b2e4e4afdc4adb52)
- Partition 043 (md5: 97c071a1d0a170e9f4809f6cdc459a6b)
- Partition 044 (md5: 27dc707435a2c98fc7201ccefec68c9d)
- Partition 045 (md5: 54ce28e1e1b54ab9fd8dd072711acd30)
- Partition 046 (md5: b174c7826fc399312fad331660745e55)
- Partition 047 (md5: 2b6ce400051fce5d2de09fd8fd461fc8)
- Partition 048 (md5: 81415b29f2b6a605473af6d3529758b1)
- Partition 049 (md5: ffc9182d8a9ad9752b6571d3d2f2b69d)
- Partition 050 (md5: 790281fcf0512a798b6f0e75b14620be)
- Partition 051 (md5: 4fc5dc17819a3727e5cedaa89550ef9f)
- Partition 052 (md5: d33a70e926dee0e67cf1a75d50ee1249)
- Partition 053 (md5: 9d66e1372866dd454173f486d57ae322)
- Partition 054 (md5: 958b62e07349258d93ee3e089c6f91ff)
- Partition 055 (md5: 0e605a04d9bbeb0573aefddbfae12bd6)
- Partition 056 (md5: 29b205c649f66e3d44ea9f598b492bc2)
- Partition 057 (md5: 7336b91e333ae912b4cfc6e366570c54)
- Partition 058 (md5: 2d992482005a2523f710487f2c0a0a31)
- Partition 059 (md5: 3b45c205982796a90aa0f40955c4937b)
- Partition 060 (md5: f085ae6a4818c44d03a6f5adfc445699)
- Partition 061 (md5: 1c5a3a0ed8b53a930535b9d34e6a0667)
- Partition 062 (md5: fbfd4ffb7cf8fca4d613d0ec67d3104c)
- Partition 063 (md5: 9ddf7a9fe7e9cf8ceb02b8debed41fcc)
- Partition 064 (md5: ee3ac8080a19d4a6ab3af84074d03d7a)
- Partition 065 (md5: d94a12692d399c44612cab8b2aea8164)
- Partition 066 (md5: a9f3bfa69bbc248b33f99f42827331eb)
- Partition 067 (md5: 6c9d4b38edc6f78521f3cfdd8edc571c)
- Partition 068 (md5: 76a29683bfad7c4a0b8a0bdbbbd6fd49)
- Partition 069 (md5: f924667636c528d56e46aa92db0a182d)
- Partition 070 (md5: f813b0a4b2a4a2353c7deb539f16f286)
- Partition 071 (md5: fa56e2524ea2cc57e79f692466375b83)
- Partition 072 (md5: 23b1df220d55ab9df2735c74849a53c9)
- Partition 073 (md5: 70839cbc61d3d8af7fafcb7ba8f96461)
- Partition 074 (md5: 109b91ceda32ab0f8b9edb24cb35fb23)
- Partition 075 (md5: 53c466af09a3a119df3255189091bcda)
- Partition 076 (md5: 22ad2327db64767e34378508afe60706)
- Partition 077 (md5: 64c7c1702e3476137c54ebc0c07d970e)
- Partition 078 (md5: 6e2048a8a2ceb36bb679455e0af81230)
- Partition 079 (md5: 45717c24fe844f2605be81bd8e15d856)
- Partition 080 (md5: 1ac20637828f0f3115f1c0f289e006aa)
- Partition 081 (md5: e7b5e584de5f2cbda1d53ec2f6e2668e)
- Partition 082 (md5: aad214d168ad3a59488dfac71fcedc22)
- Partition 083 (md5: d557dee3b08c61d540fd6a00689341fa)
- Partition 084 (md5: cc2b4676515b988dd4f64724e49c3304)
- Partition 085 (md5: 34e6154991e5d5c641e22a529c5f06e1)
- Partition 086 (md5: 2f9ff4371f32c3a33ea081ad8825437e)
- Partition 087 (md5: 945504e89ba54cdab032eac63985d216)
- Partition 088 (md5: 46a8ba05cb12b268c7f7ce04575d24da)
- Partition 089 (md5: 5fd0219c9c99aa08ce07bb35e647144c)
- Partition 090 (md5: da0e3f19f81c99a89bcff7e8f74dc6cb)
- Partition 091 (md5: c11b11f3386d47dd33acc3cba7f44fb2)
- Partition 092 (md5: 87dfa60ae9308214b43aa7075ddd9f44)
- Partition 093 (md5: 6eced035881d3e804bea7103d26c042e)
- Partition 094 (md5: 59ebbc64994779244e5f7431c54b819e)
- Partition 095 (md5: 4de3c1f5163357a256847c1082379df3)
- Partition 096 (md5: cf16e88c803b82b052651171490d6d5a)
Approximately 50x of data was generated on a NovaSeq instrument. Based on the summary output of Supernova, there are 1.2 billion reads with 41x effective coverage. The mean molecule length is 130 kbp and an N50 of 864 reads per barcode.
- CHM13_prep5_S13_L002_I1_001 (md5: 84af4586ca9f78060d5802b36cdd9e8a)
- CHM13_prep5_S13_L002_R1_001 (md5: 231633e0cf2fbdeba732dc7ad6233fa0)
- CHM13_prep5_S13_L002_R2_001 (md5: 386febfc3fc760e11e315e69310ed3d8)
- CHM13_prep5_S14_L002_I1_001 (md5: f0b7628e90dfaf2f702ec613c7b61ca7)
- CHM13_prep5_S14_L002_R1_001 (md5: 86afbc7a41ea1c81657bf1ca64d1178c)
- CHM13_prep5_S14_L002_R2_001 (md5: 3dfbe58b5ae715213e20614837dcf3b7)
- CHM13_prep5_S15_L002_I1_001 (md5: ee34f03c765787ea069050d8eaac1de4)
- CHM13_prep5_S15_L002_R1_001 (md5: 73edcb56dd18d7b7b2705b4db7b4efc5)
- CHM13_prep5_S15_L002_R2_001 (md5: a0de8e5bc127203129e4e1437b3e6aaa)
- CHM13_prep5_S16_L002_I1_001 (md5: 42db246f7e5725a7b6ff3f5f5aedfd6e)
- CHM13_prep5_S16_L002_R1_001 (md5: 3d3db7eccaf388fbcd901cbc6ad47630)
- CHM13_prep5_S16_L002_R2_001 (md5: 9dfcc17398a7acd906212a09ab4c8903)
Approximately 430x of data was generated using the Saphyr instrument and the DLE-1 enzyme. There are 15.2 M molecules with an N50 molecule length of 115.9 kbp and a max of 2.3 Mbp (2 M molecules > 150 kbp, N50 218 kbp). The assembly of the molecules is 2.97 Gbp in size with 255 contigs and an NG50 of 59.6 Mbp.
The HiC raw data will be available soon.
The PacBio data was previously generated and is available from the SRA
Files are generously hosted by Amazon Web Services. Although available as straight-forward HTTP links, download performance is improved by using the Amazon Web Services command-line interface. References should be amended to use the s3://
addressing scheme, i.e. replace https://s3.amazon.com/nanopore-human-wgs/
with s3://nanopore-human-wgs
to download. For example, to download CHM13_prep5_S13_L002_I1_001.fastq.gz
to the current working directory use the following command.
aws s3 --no-sign-request cp s3://nanopore-human-wgs/chm13/10x/CHM13_prep5_S13_L002_I1_001.fastq.gz .
or to download the full dataset use the following command.
aws s3 --no-sign-request sync s3://nanopore-human-wgs/chm13/ .
The s3 command can also be used to get information on the dataset, for example reporting the size of every file in human-readable format.
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/
or to obtain technology-specific sizes.
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/nanopore/fast5
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/nanopore/rel2
aws s3 --no-sign-request ls --recursive --human-readable --summarize s3://nanopore-human-wgs/chm13/assemblies
Amending the max_concurrent_requests
etc. settings as per this guide will improve download performance further.
Please raise issues on this Github repository concerning this dataset.
* rel1 and 2: 2nd March 2019. Initial release.