/graph-genomics-review

Pangenome graphs (review article on graph-based pangenomic methods)

Primary LanguageTeX

Pangenome graphs

A review paper for Annual Reviews in Genomics and Human Genetics.

The final submitted version of the paper has been rendered and is provided in this repo.

notes

Work on github (Erik to make structure), use .bib for citations, use one line per sentence, first draft doesn’t have to compile.

Outline and division of effort

  • Introduction - Erik (sections/intro.tex)
    • Why we need pangenomic models
    • What is our motivation for thinking about pangenomic approaches?
      • Bias
      • Populations
      • Precision medicine
    • Perspective of interfaces (inputs and outputs)
    • Past reviews
  • Building pangenomic models (sections/models.tex)
    • Constructing graphs - Robin
    • Indexing and succinct genome graph models - Jouni / Erik?
    • Other population-ish succinct data structures - Erik / Jouni?
      • De bruijn
      • VCFs / genotype calls / haplotypes / binary matrices
      • Alignments / collections of strings
  • Relating new information to the model (sections/relating.tex)
    • Visualization - Adam
    • Finding structures in pangenome graphs - Jordan
    • Graph alignment algorithms - Jordan
    • Variation graph mappers - Xian
    • De Bruijn graph mappers - Robin
    • Non-graph population mapping tools - Erik
  • Applications of pangenomic models (sections/applications.tex)
    • Error correction - Robin
    • Variant calling / Genotyping - Glenn
    • Assembly - Erik
    • Epigenomics - Glenn
    • Transcriptomics - Jonas
    • Metagenomics and quasispecies - Jonas
  • Discussion - Benedict (sections/discussion.tex)

References

See bib/references.bib for a subset of the citations below in bibtex format. These were auto-generated. The rest may need to be manually introduced (e.g. from google scholar citations).

Introduction

...

Past reviews / Opinion pieces

Computational pan genomics (2016) https://doi.org/10.1093/bib/bbw089

Genome graphs and genome inference (2017) 10.1101/gr.214155.116

Is it time to change the reference genome? (2019) https://doi.org/10.1186/s13059-019-1774-4

Hackathon Paper (2019) http://dx.doi.org/10.12688/f1000research.19630.1

One reference genome is not enough (2019) http://dx.doi.org/10.1186/s13059-019-1717-0

Constructing graphs

Coordinates and intervals on genome graphs (preprint 2016) http://dx.doi.org/10.1101/063206

FORGe (2018) https://doi.org/10.1186/s13059-018-1595-x

NovoGraph (2018) 10.12688/f1000research.15895.1

HUPAN (2019) https://doi.org/10.1186/s13059-019-1751-y

Bake off (preprint 2017) http://dx.doi.org/10.1101/101378

VG toolkit paper (2018) https://dx.doi.org/10.1038%2Fnbt.4227

EG’s thesis (2019) -- describes vg construct, seqwish, and vg msga https://doi.org/10.17863/CAM.41621

Minigraph (2019)

GenomeMapper(2009) https://genomebiology.biomedcentral.com/articles/10.1186/gb-2009-10-9-r98

Graph alignment algorithms

Classic (bit little known) DP for aligning to (cyclic) graphs (2000) http://dx.doi.org/10.1016/S0304-3975(99)00333-3

Approximate matching of regular expressions (1989) http://dx.doi.org/10.1016/S0092-8240(89)80046-1

A New Method That Simultaneously Aligns and Reconstructs Ancestral Sequences for Any Number of Homologous Sequences, When the Phylogeny Is Given (1989) http://dx.doi.org/10.1093/oxfordjournals.molbev.a040577

Partial order alignment (2002) https://doi.org/10.1093/bioinformatics/18.3.452

PO-POA (2004) -- DAG to DAG alignment and MSA construction https://doi.org/10.1093/bioinformatics/bth126

Adam’s context mapping (2015) https://doi.org/10.1093/bioinformatics/btv435

Some guy’s master’s thesis on Adam’s context mapping (2016) https://www.semanticscholar.org/paper/Aligning-reads-against-a-graph-based-reference-Leonardsen/cb05ae5be6c29bfd220c43402a8657fa21e47c54

Complexity of string matching for graphs (2019) 10.4230/LIPIcs.ICALP.2019.55

V-ALIGN sequence alignment on directed graphs (preprint 2017) -- this has an official publication (http://dx.doi.org/10.1089/cmb.2017.0264), but it’s paywalled https://doi.org/10.1101/124941

Aligning sequences to general graphs in O(V + mE) time (preprint 2017) http://dx.doi.org/10.1101/216127 (Note that similar results have been published by Navarro in 2000, see above)

Bit-parallel sequence to graph alignment (2019) https://doi.org/10.1093/bioinformatics/btz162

On the complexity of sequence to graph alignment (preprint 2019) http://dx.doi.org/10.1101/522912

PaSGAL Accelerating sequence to graph alignment (preprint 2019) https://doi.org/10.1101/651638

Indexing and succinct genome graph models

Blight library -- minimizers for DBGs (preprint 2019) https://www.biorxiv.org/content/10.1101/546309v2

CHOP: haplotype indexing in graphs (preprint 2018) https://doi.org/10.1101/305268

PSI -- pan genomic seed index (2019) https://doi.org/10.1093/bioinformatics/btz341

Improved encoding of genetic variation in BWT (preprint 2019) http://dx.doi.org/10.1101/658716

BWBBLE (2013) https://doi.org/10.1093/bioinformatics/btt215

Gramtools / vBWT (2016) https://doi.org/10.1007/978-3-319-43681-4_18

GCSA (2014) 10.1109/TCBB.2013.2297101

GCSA2 (2016) https://doi.org/10.1137/1.9781611974768.2

Master’s thesis on distance metrics in variant graphs https://www.duo.uio.no/handle/10852/57798

Validating paired end reads in sequence graphs (preprint 2019) http://dx.doi.org/10.1101/682799

Sparse dynamic programming on DAGS of small width (2019) 10.1145/3301312

gPBWT (2017) https://doi.org/10.1186/s13015-017-0109-9

GBWT (preprint 2018) https://arxiv.org/abs/1805.03834

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment (preprint Nov 2018) https://doi.org/10.1101/472423

Other population-ish succinct data structures

PanCake - representing aligned sequences (2013) 10.4230/OASIcs.GCB.2013.35

FM index of an alignment (2016) https://doi.org/10.1016/j.tcs.2015.08.008

FM index of a gapped alignment (2018) https://doi.org/10.1016/j.tcs.2017.02.020

Journaled string tree (2014) https://doi.org/10.1093/bioinformatics/btu438

Population BWT -- reference free sequences (2017) 10.1101/gr.211748.116

Making a DBG with BWT https://doi.org/10.1093/bioinformatics/btv603

Bloom Filter Trie -- pan genome storage (2015) 10.1007/978-3-662-48221-6_16

Multi-BRWT -- colored DBG (2018) https://doi.org/10.3929/ethz-b-000314581

PufferFish -- colored DBG (2018) https://doi.org/10.1093/bioinformatics/bty292

Mettanot - colored DBG (preprint 2017) https://doi.org/10.1101/236711

GTC - VCF files (2018) https://doi.org/10.1093/bioinformatics/bty023

MuGI - VCF files (2014) https://doi.org/10.1371/journal.pone.0109384

Compressing large VCFs (2011) https://doi.org/10.1093/bioinformatics/btt460

Tomahawk ...

PBWT -- phased VCFs (2014) https://doi.org/10.1093/bioinformatics/btu014

BGT - VCFs (2016) https://doi.org/10.1093/bioinformatics/btv613

Complete index for pan genomic alignment (2019) https://doi.org/10.1007/978-3-030-17083-7_10

DBGs https://www.pnas.org/content/98/17/9748.short

Colored DBGs https://www.nature.com/ng/journal/v44/n2/abs/ng.1028.html

BiFrost https://www.biorxiv.org/content/10.1101/695338v2.abstract

Pan-Tools (kmer based annotations) (just uses neo4j) https://doi.org/10.1093/bioinformatics/btw455

SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips (2014) https://doi.org/10.1093/bioinformatics/btu756

Finding structures in pangenome graphs

Bubbles (various) Bubbleparse (2013) https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0060058

Superbubbles (various) ...

Context mapping (?) ...

Snarls (2018) https://doi.org/10.1089/cmb.2017.0251

SPQR tree decomposition https://en.wikipedia.org/wiki/SPQR_tree

Flow sort (2018) https://doi.org/10.1089/cmb.2017.0248

Minimum founder reconstruction on genome graphs (2019) https://doi.org/10.1186/s13015-019-0147-6

Variation graph mappers

VG (2018) https://doi.org/10.1038/nbt.4227

deBGA-VARA (2019) 10.1109/bibm.2018.8621555

HISAT2 (2019) https://doi.org/10.1038/s41587-019-0201-4

GenomeMapper (2009) https://doi.org/10.1186/gb-2009-10-9-r98

V-MAP (2019) 10.4230/LIPIcs.WABI.2019.7

7 bridges (2019) https://doi.org/10.1038/s41588-018-0316-4

GraphAligner (2019) -- also in the alignment section DP Algorithm: https://doi.org/10.1093/bioinformatics/btz162 Tool preprint: https://doi.org/10.1101/810812

De Bruijn graph mappers

BrownieAligner (2018) https://doi.org/10.1186/s12859-018-2319-7

BlastGraph (2012) http://www.stringology.org/event/2012/p06.html

BGREAT (2016) https://doi.org/10.1186/s12859-016-1103-9

deBGA (2016) https://doi.org/10.1093/bioinformatics/btw371

Non-graph population mapping tools

AltHapAlignR (2018) https://doi.org/10.1093/bioinformatics/bty125

CHIC (preprint 2017) http://dx.doi.org/10.1101/178129

Visualization

Tube maps (2019) https://doi.org/10.1093/bioinformatics/btz597

Bandage (2015) https://doi.org/10.1093/bioinformatics/btv383

EG’s thesis https://doi.org/10.17863/CAM.41621

GfaViz (2019) https://doi.org/10.1093/bioinformatics/bty1046

Assembly Graph Browser (2019) https://doi.org/10.1093/bioinformatics/btz072

SGTK (2019) https://doi.org/10.1093/bioinformatics/bty956

Downstream use cases

Error correction

Lordec (2014) http://dx.doi.org/10.1093/bioinformatics/btu538

Bcool (2019) https://doi.org/10.1093/bioinformatics/btz102

BCT (preprint 2019) http://dx.doi.org/10.1101/673624

GraphAligner (preprint 2019) -- alread mentioned as aligner above https://doi.org/10.1101/810812

Variant calling / Genotyping

Cortex (2012) https://www.nature.com/articles/ng.1028

Bubbleparse (2013) https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0060058

1000GP phase 3 paper (2015) -- graph based genotyping process described in supplement https://doi.org/10.1038/nature15393

PanVC (2018) https://doi.org/10.1186/s12864-018-4465-8

HISAT-Genotype (2019) -- shared paper with HISAT2 https://doi.org/10.1038/s41587-019-0201-4

PRG (2015) https://doi.org/10.1038/ng.3257

HLA/PRG (2016) https://doi.org/10.1371/journal.pcbi.1005151

HLA/LA (2019) https://doi.org/10.1093/bioinformatics/btz235

Paragraph (preprint 2019) http://dx.doi.org/10.1101/635011

Vg call for SVs (preprint 2019) https://www.biorxiv.org/content/10.1101/654566v1.abstract

ExpansionHunter (preprint 2019) http://dx.doi.org/10.1101/572545

GraphTyper (2019) https://doi.org/10.1038/s41588-018-0316-4

BayesTyper (2018) https://doi.org/10.1038/s41588-018-0145-5

Kourami (2018) https://doi.org/10.1186/s13059-018-1388-2

Epigenomics

GraphPeakCaller (2019) https://doi.org/10.1371/journal.pcbi.1006731

Personalized and graph genomes reveal missing signal in epigenomic data (preprint 2019) http://dx.doi.org/10.1101/457101

Transcriptomics

Quantifies RNA-seq reference-bias (2009) https://doi.org/10.1093/bioinformatics/btp579

GSNAP: SNP-aware mapper (2010) https://www.doi.org/10.1093/bioinformatics/btq057

AlleleSeq: Diploid personal genome mapping (2011) https://doi.org/10.1038/msb.2011.54

MMSEQ: Diploid transcriptome (2011) https://doi.org/10.1186/gb-2011-12-2-r13

Quantifies RNA-seq reference-bias (2014) https://doi.org/10.1186/s13059-014-0467-2

Describes reference-bias in relation to ASE (2015) https://doi.org/10.1186/s13059-015-0762-6

WASP: reference-bias correction (2015) https://doi.org/10.1038/nmeth.3582

rPGA: Personal genome mapping (2015) https://doi.org/10.1093/nar/gkv1099

Kallisto: de Bruijn graph pseudo-alignment (2015) https://doi.org/10.1038/nbt.3519

ASElux: SNP-aware alignment (2017) https://doi.org/10.1093/bioinformatics/btx762

ASGAL: Splice-graph mapper (2018) https://link.springer.com/chapter/10.1007/978-3-319-58163-7_3 https://www.doi.org/10.1186/s12859-018-2436-3

AltHapAlignR: Mapping to alternative reference haplotypes (2018) https://doi.org/10.1093/bioinformatics/bty125

iMapSplice: Mapping to alternative reference bases (2018) https://doi.org/10.1371/journal.pone.0201554

EMASE: Alignment to a diploid transcriptome (2018) https://doi.org/10.1093/bioinformatics/bty078

HISAT2: Variation graph mapper (2019) - also mentioned in the variation graph mapping section https://doi.org/10.1038/s41587-019-0201-4

Metagenomics and quasispecies

Mykrobe predictor (2015) https://doi.org/10.1038/ncomms10063

MetaKallisto (2017) https://doi.org/10.1093/bioinformatics/btx106

Metagenomic classification and assembly review (2017) https://doi.org/10.1093/bib/bbx120

GROOT (2018) https://doi.org/10.1093/bioinformatics/bty387

Virus-VG (2019) https://doi.org/10.1093/bioinformatics/btz443

VG-Flow (2019) https://doi.org/10.1101/645721