Mark Akeson (1), Andrew D. Beggs (2), Thomas Nieto (2), Miten Jain (1), Nicholas J. Loman (3), Matt Loose (4), Sunir Malla (4), Justin O’Grady (5), Hugh E. Olsen (1), Josh Quick (3), Hollian Richardson (5), Jared T. Simpson (6,7), Terrance P. Snutch (8), Louise Tee (2), John R. Tyson (8)
- University of California, Santa Cruz, Santa Cruz, CA, USA
- University of Birmingham, Birmingham, B15 2TT
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
- DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK
- Norwich Medical School, University of East Anglia, Norwich, NR4 7UQ, United Kingdom.
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
We have sequenced the CEPH1463 (NA12878/GM12878, Ceph/Utah pedigree) human genome reference standard on the Oxford Nanopore MinION using 1D ligation kits (450 bp/s) using R9.4 chemistry (FLO-MIN106).
Human genomic DNA from GM12878 human cell line (Ceph/Utah pedigree) was either purchased from Coriell - "DNA" - (cat no NA12878) or extracted from the cultured cell line - "cells". As the DNA is native, modified bases will be preserved.
Files are generously hosted by Amazon Web Services. Although available as straight-forward HTTP links, download performance is improved by using the Amazon Web Services command-line interface. References should be amended to use the s3://
addressing scheme, i.e. replace http://s3.amazon.com/nanopore-human-wgs/
with s3://nanopore-human-wgs
to download. For example, to download rel3-nanopore-wgs-288418386-FAB39088
to the current working directory use the following command.
aws s3 cp s3://nanopore-human-wgs/rel3-nanopore-wgs-288418386-FAB39088.fastq.gz .
Amending the max_concurrent_requests
etc. settings as per this guide will improve download performance further.
The rel3
release consists of the full dataset, and has two new rapid kit runs with a new long DNA extraction method:
- 39 flowcells
- 91240120433 bases
- 14183584 reads
flowcell_id | reads | bases | Date | Centre | SampleType | Kit | Pore | Links |
---|---|---|---|---|---|---|---|---|
FAB23716 | 356209 | 1409812422 | 14/07/16 | UBC | DNA | Rapid | R9 | FASTQ |
FAB39088 | 658224 | 3287994454 | 19/09/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB39075 | 466329 | 2439355478 | 20/09/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB39043 | 436976 | 2273008592 | 23/09/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB42706 | 430660 | 1966505502 | 12/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB41174 | 117057 | 687394987 | 13/10/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB42260 | 267644 | 1399557161 | 13/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42804 | 16669 | 75062609 | 14/10/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB42316 | 572838 | 3275026637 | 14/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42205 | 317654 | 1686630108 | 14/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42561 | 233678 | 1520513556 | 19/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42473 | 644869 | 3357548938 | 19/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42395 | 38291 | 179704035 | 20/10/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB42476 | 435158 | 2363036522 | 27/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42451 | 817629 | 4530477841 | 28/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42704 | 276152 | 1750149482 | 28/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42828 | 33527 | 163405138 | 01/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB42810 | 322058 | 2020615256 | 02/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB42798 | 193551 | 1339441522 | 03/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB45280 | 128234 | 799554798 | 11/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB46664 | 491346 | 2038018797 | 15/11/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB46683 | 72605 | 286275511 | 17/11/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB45332 | 530938 | 2864140853 | 17/11/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB43577 | 426941 | 2539015084 | 18/11/16 | UCSC | DNA | Ligation | R9.4 | FASTQ |
FAB44989 | 558224 | 3443824633 | 18/11/16 | UCSC | DNA | Ligation | R9.4 | FASTQ |
FAF01169 | 339447 | 2913892142 | 22/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAF01441 | 254705 | 2203636947 | 22/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAB45277 | 53547 | 445641679 | 22/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB45321 | 299174 | 2584017112 | 22/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAF01127 | 632728 | 4972081712 | 25/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAF01132 | 689781 | 5455971336 | 25/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAB49712 | 632158 | 4906148911 | 28/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAF01253 | 471698 | 3695661984 | 28/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAB45321* | 123037 | 1043504055 | 28/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB49914 | 309175 | 2841008085 | 28/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB45271 | 472656 | 3689043164 | 28/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB49164 | 746333 | 4438258089 | 06/12/16 | UCSC | DNA | Ligation | R9.4 | FASTQ |
FAB49908 | 224380 | 3141600861 | 09/12/16 | Bham | Cells | Rapid | R9.4 | FASTQ |
FAF04090 | 91304 | 1213584440 | 09/12/16 | Bham | Cells | Rapid | R9.4 | FASTQ |
Please verify downloads against MD5 hashes.
[*] This flowcell ID was input incorrectly.
#### Alignments by flowcell
Reads aligned against pre-computed 1000 genomes GRCh38 BWA database at ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/ with decoys using BWA MEM (commit: 5961611c358e480110793bbf241523a3cfac049b
) using parameters -x ont2d
. Alignment statistics calculated using samtools stats
(samtools version 1.3.1).
FileID | Sequences | Mapped | Mapped MQ0 | Unmapped | Bases Mapped | Avg Length | Link | |
---|---|---|---|---|---|---|---|---|
FAB23716 | 356209 | 319259 | 26702 | 36950 | 1165998694 | 3957 | BAM | BAI |
FAB39088 | 658224 | 613044 | 35394 | 45180 | 3007307322 | 4995 | BAM | BAI |
FAB39075 | 466329 | 425117 | 28167 | 41212 | 2146453407 | 5230 | BAM | BAI |
FAB39043 | 436976 | 415389 | 21043 | 21587 | 2113140439 | 5201 | BAM | BAI |
FAB42706 | 430660 | 375374 | 17378 | 55286 | 1867123361 | 4566 | BAM | BAI |
FAB41174 | 117057 | 114520 | 4186 | 2537 | 652217119 | 5872 | BAM | BAI |
FAB42260 | 267644 | 246982 | 15624 | 20662 | 1263089767 | 5229 | BAM | BAI |
FAB42804 | 16669 | 13311 | 1755 | 3358 | 53666089 | 4503 | BAM | BAI |
FAB42316 | 572838 | 512994 | 18985 | 59844 | 3100596254 | 5717 | BAM | BAI |
FAB42205 | 317654 | 282502 | 12561 | 35152 | 1601397762 | 5309 | BAM | BAI |
FAB42561 | 233678 | 225141 | 10255 | 8537 | 1420740185 | 6506 | BAM | BAI |
FAB42473 | 644869 | 611138 | 32539 | 33731 | 3112342902 | 5206 | BAM | BAI |
FAB42395 | 38291 | 36477 | 2059 | 1814 | 167168840 | 4693 | BAM | BAI |
FAB42476 | 435158 | 416969 | 20908 | 18189 | 2214880871 | 5430 | BAM | BAI |
FAB42451 | 817629 | 779328 | 36986 | 38301 | 4178966543 | 5540 | BAM | BAI |
FAB42704 | 276152 | 263722 | 12926 | 12430 | 1619875186 | 6337 | BAM | BAI |
FAB42828 | 33527 | 27843 | 2442 | 5684 | 146819837 | 4873 | BAM | BAI |
FAB42810 | 322058 | 305070 | 16802 | 16988 | 1808343119 | 6274 | BAM | BAI |
FAB42798 | 193551 | 185739 | 8749 | 7812 | 1232035338 | 6920 | BAM | BAI |
FAB45280 | 128234 | 122219 | 6336 | 6015 | 743280816 | 6235 | BAM | BAI |
FAB46664 | 491346 | 456247 | 27622 | 35099 | 1862427349 | 4147 | BAM | BAI |
FAB46683 | 72605 | 64739 | 5307 | 7866 | 269213160 | 3942 | BAM | BAI |
FAB45332 | 530938 | 497862 | 26392 | 33076 | 2620752139 | 5394 | BAM | BAI |
FAB43577 | 426941 | 410137 | 19835 | 16804 | 2344990054 | 5946 | BAM | BAI |
FAB44989 | 558224 | 536572 | 25936 | 21652 | 3161900821 | 6169 | BAM | BAI |
FAF01169 | 339447 | 315489 | 16481 | 23958 | 2677881316 | 8584 | BAM | BAI |
FAF01441 | 254705 | 238834 | 12458 | 15871 | 2010117898 | 8651 | BAM | BAI |
FAB45277 | 53547 | 51957 | 2132 | 1590 | 426639054 | 8322 | BAM | BAI |
FAB45321 | 299174 | 283355 | 15165 | 15819 | 2366003310 | 8637 | BAM | BAI |
FAF01127 | 632728 | 605633 | 27192 | 27095 | 4640355789 | 7858 | BAM | BAI |
FAF01132 | 689781 | 655357 | 33564 | 34424 | 4966810089 | 7909 | BAM | BAI |
FAB49712 | 632158 | 612752 | 26264 | 19406 | 4594356245 | 7760 | BAM | BAI |
FAF01253 | 471698 | 454434 | 20639 | 17264 | 3430678969 | 7834 | BAM | BAI |
FAB45321 | 123037 | 118311 | 5891 | 4726 | 952851126 | 8481 | BAM | BAI |
FAB49914 | 309175 | 296250 | 12281 | 12925 | 2673848960 | 9188 | BAM | BAI |
FAB45271 | 472656 | 450702 | 20148 | 21954 | 3468377327 | 7804 | BAM | BAI |
FAB49164 | 746333 | 718351 | 32664 | 27982 | 4107087899 | 5946 | BAM | BAI |
FAB49908 | 224380 | 211060 | 11903 | 13320 | 2898563539 | 14001 | BAM | BAI |
FAF04090 | 91304 | 83164 | 6072 | 8140 | 1085757398 | 13291 | BAM | BAI |
Flowcell alignments were separated into individual chromosomes using samtools merge
.
Chrom | Mapped # | Mapped MQ0 | Bases Mapped | Avg Length | BAM | BAI |
---|---|---|---|---|---|---|
chr1 | 1075867 | 43397 | 6829526262 | 6744 | BAM | BAI |
chr2 | 1062314 | 31802 | 6755642896 | 6842 | BAM | BAI |
chr3 | 858643 | 24189 | 5487703898 | 6757 | BAM | BAI |
chr4 | 845677 | 30723 | 5395140705 | 6890 | BAM | BAI |
chr5 | 774613 | 23499 | 4953273570 | 6821 | BAM | BAI |
chr6 | 723047 | 24496 | 4618883250 | 6762 | BAM | BAI |
chr7 | 696473 | 28231 | 4382999832 | 6772 | BAM | BAI |
chr8 | 617988 | 23361 | 3968911801 | 6844 | BAM | BAI |
chr9 | 539660 | 25898 | 3428430670 | 6764 | BAM | BAI |
chr10 | 594688 | 20787 | 3805443564 | 6845 | BAM | BAI |
chr11 | 583055 | 17748 | 3710684724 | 6855 | BAM | BAI |
chr12 | 586663 | 17891 | 3734922623 | 6840 | BAM | BAI |
chr13 | 440615 | 17662 | 2844212242 | 6904 | BAM | BAI |
chr14 | 383777 | 15752 | 2439119767 | 6713 | BAM | BAI |
chr15 | 359853 | 19556 | 2268233023 | 6838 | BAM | BAI |
chr16 | 386401 | 22680 | 2425913744 | 6787 | BAM | BAI |
chr17 | 369036 | 22907 | 2302471086 | 6661 | BAM | BAI |
chr18 | 339094 | 13053 | 2172098564 | 6807 | BAM | BAI |
chr19 | 257039 | 10926 | 1472760724 | 6266 | BAM | BAI |
chr20 | 291960 | 13226 | 1829244829 | 6659 | BAM | BAI |
chr21 | 192383 | 24988 | 1207807437 | 6792 | BAM | BAI |
chr22 | 172934 | 10514 | 1041347396 | 6665 | BAM | BAI |
chrX | 658347 | 28769 | 4210769167 | 7076 | BAM | BAI |
chrY | 23378 | 5292 | 133803203 | 7869 | BAM | BAI |
chrM | 59363 | 658 | 91949786 | 1628 | BAM | BAI |
FAST5 files have been split by chromosome according to the above alignments, meaning that some files may be found in multiple archives (they can be made non-redundant by reference to the filename). Each complete 'part' contains 100,000 reads and should be roughly in sort order along the chromosome to aid region-by-region analysis.
Uploads are not yet complete.
| | | | | | | | | |
|-------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------| | chr1 | part1 (391 G) | part2 (291 G) | part3 (284 G) | part4 (265 G) | part5 (265 G) | part6 (242 G) | part7 (269 G) | part8 (202 G) | part9 (205 G) | | chr2 | part1 (395 G) | part2 (311 G) | part3 (279 G) | part4 (287 G) | part5 (288 G) | part6 (300 G) | part7 (266 G) | part8 (247 G) | part9 (223 G) | | chr3 | part1 (338 G) | part2 (310 G) | part3 (308 G) | part4 (249 G) | part5 (290 G) | part6 (265 G) | part7 (278 G) | part8 (220 G) | part9 (236 G) | | chr4 | part1 (423 G) | part2 (346 G) | part3 (344 G) | part4 (245 G) | part5 (321 G) | part6 (237 G) | part7 (379 G) | part8 (214 G) | part9 (213 G) | | chr5 | part1 (385 G) | part2 (393 G) | part3 (286 G) | part4 (286 G) | part5 (264 G) | part6 (298 G) | part7 (259 G) | part8 (215 G) | part9 (207 G) | | chr6 | part1 (313 G) | part2 (319 G) | part3 (298 G) | part4 (318 G) | part5 (263 G) | part6 (258 G) | part7 (264 G) | part8 (230 G) | part9 (207 G) | | chr7 | part1 (366 G) | part2 (332 G) | part3 (308 G) | part4 (335 G) | part5 (299 G) | part6 (243 G) | part7 (231 G) | part8 (242 G) | part9 (238 G) | | chr8 | part1 (354 G) | part2 (309 G) | part3 (303 G) | part4 (265 G) | part5 (274 G) | part6 (247 G) | part7 (261 G) | part8 (214 G) | part9 (177 G) | | chr9 | part1 (352 G) | part2 (308 G) | part3 (247 G) | part4 (278 G) | part5 (263 G) | part6 (301 G) | part7 (226 G) | part8 (146 G) | | | chr10 | part1 (367 G) | part2 (337 G) | part3 (296 G) | part4 (282 G) | part5 (280 G) | part6 (245 G) | part7 (233 G) | part8 (258 G) | part9 (45 G) | | chr11 | part1 (363 G) | part2 (309 G) | part3 (290 G) | part4 (266 G) | part5 (287 G) | part6 (306 G) | part7 (232 G) | part8 (239 G) | part9 (10 G) | | chr12 | part1 (386 G) | part2 (323 G) | part3 (259 G) | part4 (278 G) | part5 (290 G) | part6 (271 G) | part7 (242 G) | part8 (256 G) | part9 (62 G) | | chr13 | part1 (307 G) | part2 (326 G) | part3 (335 G) | part4 (327 G) | part5 (306 G) | part6 (244 G) | part7 (123 G) | | | | chr14 | part1 (356 G) | part2 (363 G) | part3 (306 G) | part4 (235 G) | part5 (292 G) | part6 (149 G) | | | | | chr15 | part1 (322 G) | part2 (328 G) | part3 (322 G) | part4 (262 G) | part5 (259 G) | | | | | | chr16 | part1 (347 G) | part2 (327 G) | part3 (276 G) | part4 (308 G) | part5 (259 G) | part6 (120 G) | | | | | chr17 | part1 (330 G) | part2 (281 G) | part3 (273 G) | part4 (263 G) | part5 (310 G) | part6 (19 G) | | | | | chr18 | part1 (386 G) | part2 (315 G) | part3 (337 G) | part4 (264 G) | part5 (320 G) | | | | | | chr19 | part1 (417 G) | part2 (320 G) | part3 (286 G) | part4 (228 G) | | | | | | | chr20 | part1 (352 G) | part2 (285 G) | part3 (281 G) | part4 (300 G) | part5 (06 G) | | | | | | chr21 | part1 (329 G) | part2 (395 G) | part3 (290 G) | | | | | | | | chrX | part1 (592 G) | part2 (284 G) | part3 (285 G) | part4 (274 G) | part5 (280 G) | part6 (309 G) | part7 (227 G) | part8 (261 G) | part9 (228 G) | | chrY | part1 (584 G) | | | | | | | | | | chrM | part1 (33 G) | | | | | | | | |
Kindly contributed by Adam Philippy and Sergey Koren.
Unpolished assembly results from all above nanopore data Canu contigs.
Contigs: 2886
Bases: 2646010004
Min: 1,673
Max: 27,160,256
NG25: 6,437,016 COUNT: 80
NG50: 2,963,950 COUNT: 266
NG75: 670,702 COUNT: 776
Figure: A typical read length distribution from a flowcell where we have run a cell-extracted DNA library. The y-axis shows the count of bases. Mean read length ~8.6kb with N50 of ~12.5kb (vertical line). Reads longer than 60kb are not expected due to limitations of the QIAGEN extraction kit employed.
This dataset is currently subject to rapid change as we continue to post up runs, therefore some statistics here may not represent full nanopore runs.
We would like to acknowledge the support of Oxford Nanopore Technologies in generating this dataset, with particular thanks to Rosemary Dokos, Oliver Hartwell, Jonathan Pugh and Clive Brown. We would like to thank Radoslaw Poplawski and Simon Thompson for technical assistance with configuration and optimising of the CLIMB platform file system. We are grateful to Angel Pizarro and Jed Sundwall at Amazon Web Services for hosting this dataset as an AWS Open Data set.
Please raise issues on this Github repository concerning this dataset. A preprint describing the dataset in more detail will be available shortly.
* rel1: 1st December 2016. Initial release.
* rel2: 5th December 2016. 25 flowcells, 58958035887 bases, 9053909 reads