
This repository contains data indexes from NIST's Genome in a Bottle project.

Primary LanguageRoff


This repository contains data indexes from NIST's Genome in a Bottle (GIAB) project. The indexes for sequences and alignments are also available under: ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes .


Son:HG002     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/
Father:HG003    https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG003_NA24149_father/
Mother:HG004     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG004_NA24143_mother/

Sequencing Platform Sequence Alignment
Illumina WGS 2x150bp 300X per individual All     HG002     HG003     HG004 novoalign:   All     HG002     HG003     HG004
Illumina 6KB Matepair All     HG002    HG003     HG004 bwamem:hg19   All     HG002     HG003     HG004
Illumina WGS 2X250bp All     HG002     HG003     HG004 isaac:hg19   All     HG002     HG003     HG004
novoalign:   All    HG002     HG003     HG004
Moleculo All     HG002     HG003     HG004
Illumina Whole Exome - bwamem:hg19   All     HG002    HG003     HG004
SOLiD 60x for son All     HG002 LifeScope:hg19   All     HG002
CompleteGenomics - CGAtools:hg19   All     HG002     HG003     HG004
Ion Proton 1000x Exome - TMAP:hg19   All     HG002     HG003     HG004
10X Genomics - bwamem:hg19   All     HG002     HG003     HG004
10X Genomics ChromiumGenome All     HG002 LongRanger2.0:hg19   All     HG002     HG003     HG004
BioNano All:bnx     HG002:bnx     HG003:bnx     HG004:bnx All:cmap     HG002     HG003     HG004
PacBio 70x/30x/30x All     HG002     HG003     HG004
All:hdf5     HG002     HG003     HG004
blasr:hg19   All     HG002     HG003     HG004
bwamem: hg19   All     HG002     HG003     HG004
PacBio CCS 10kb All     HG002 pbmm2:hg19   All     HG002
PacBio CCS 11kb All     HG002 pbmm2:hg19   All     HG002
PacBio CCS 15kb All     HG002 pbmm2:hg19   All     HG002
PacBio CCS 15kb_20kb chemistry2 All     HG002 pbmm2:   All     HG002     HG003     HG004
Oxford Nanopore 2D All     HG002 -
Oxford Nanopore ultralong (guppy-V3.2.4_2020-01-22) All     HG002 minimap2:whatshap:hg19   All     HG002
Oxford Nanopore ultralong Promethion All     HG002     HG003     HG004 -
BGI BGISEQ500 All     HG002 -
BGI MGISEQ PCR-free All     HG002 -
BGI stLFR All     HG002     HG003     HG004 All:bwamem:hg19     HG002     HG003     HG004
Strand-Seq HG002 by BCCRC All     HG002 -

* CompleteGenomics LFR raw or alignment data not available, but analysis results available under: ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/CompleteGenomics_newLFR_CGAtools_06122015/


Son:HG005     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG005_NA24631_son/
Father:HG006     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG006_NA24694-huCA017E_father/
Mother:HG007     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ChineseTrio/HG007_NA24695-hu38168_mother/

Sequencing Platform Sequence Alignment
Illumina WGS 2x250bp 300X for son;
2x150bp 100x for parents
All     HG005     HG006     HG007 novoalign:   All:hg19-hg38     HG005:hg19-hg38     HG006:hg19-hg38     HG007:hg19-hg38
Illumina 6KB Matepair All     HG005     HG006     HG007
Moleculo All     HG005     HG006     HG007
SOLiD 60x for son All:xsq     HG005:xsq LifeScope:   All:hg19     HG005:hg19
CompleteGenomics CGAtools: All:hg19 (RMDNA)     HG005:hg19     HG006:hg19     HG007:hg19
CGAtools: All:hg19 (cellsDNA)     HG005:hg19
Illumina Whole Exome bwamem:   All:hg19     HG005:hg19
Ion Proton 1000x Exome TMAP:   All:hg19     HG005:hg19
BioNano for son All:bnx     HG005:bnx All:hg19 (cmap)     HG005:hg19 (cmap)
PacBio Sequel for the trio All     HG005     HG006     HG007
PacBio SequelII CCS 11kb sequence.index
BGI BGISEQ500 sequence.index
BGI MGISEQ sequence.index
BGI stLFR sequence.index


NA12878:HG001     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/

Sequencing Platform Sequence Alignment
Illumina WGS 2x150bp 300X HG001 bwamem:   HG001:hg19 (downsampled30x)
novoalign:   HG001
Illumina HiSeq Exome HG001
bwamem:   HG001:hg19
Illumina TruSeq Exome bwamem:   HG001:hg19
10X Genomics bwamem:   HG001:hg19
bwamem:   HG001:hg19 (size_selected)
10X Genomics ChromiumGenome LongRanger2.0:   HG001:hg19-hg38
LongRanger2.1:   HG001:hg19-hg38
CompleteGenomics CGAtools:   HG001:hg19
Ion Proton 1000x Exome TMAP:   HG001:hg19
NA12878 SOLiD5500W LifeScope:   HG001:hg19
BGI BGISEQ500 sequence.index
BGI MGISEQ sequence.index
BGI stLFR sequence.index
PacBio 40x HG001:hdf5
PacBio SequelII CCS 11kb sequence.index
Ultralong_OxfordNanopore -
minimap2:   HG001
  • CompleteGenomics LFR raw or alignment data not available, but analysis results available under: ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/CompleteGenomics_newLFR_CGAtools_06122015/ .

Please Note:
1. If you want to use raw sequencing data (fastq, fasta, hdf5, xsq, bnx etc) for your analysis, then you can use the sequence.index.* files when you need to download the data.
2. If you want to use aligned data (bam, xmap/cmap etc.) for your analysis, then you can use the alignment.index.* files when you need to download the data.