2022-pangenome-graphs-intro

Lecture given at the CGSI, UCLA, July 22nd 2022.

Slides: https://docs.google.com/presentation/d/1KBckpDnKlDZpvRktt_RSAxUCTcOA9n83ERkDqNo3JKs/edit?usp=sharing

Video: https://www.youtube.com/watch?v=ve07C4hY94Y

Rationale

We'll show how some common pangenome graphs can be constructed in practice, on a simple example.

Data

Data is two E. coli genomes, present in the data/ folder.

de Bruijn graph

Create compacted dBG with k=31 using bcalm2:

../tools/bcalm -in ../data/two_ecolis.fasta -kmer-size 31 -abundance-min 1

Convert bcalm2's FASTA (with edge information) to GFA:

../tools/convertToGFA.py two_ecolis.unitigs.fa two_ecolis.unitigs.gfa 31

Simplify graph by removing small bubbles using gfatools:

../tools/gfatools asm -b 100 -u two_ecolis.unitigs.gfa > two_ecolis.unitigs.bu.gfa

Trying a larger k value (k=300):

../tools/bcalm-k320 -in ../data/two_ecolis.fasta -kmer-size 300 -abundance-min 1

Variation graph

Create a raw pangenome graph using minimap2 + seqwish:

minimap2 -c -X ../data/two_ecolis.fasta ../data/two_ecolis.fasta > two_ecolis.paf
../tools/seqwish  -s ../data/two_ecolis.fasta -p two_ecolis.paf -g two_ecolis.gfa

Simplify using smoothxg (Takes a while!):

./tools/smoothxg -g two_ecolis.gfa -o two_ecolis.smooth.gfa

Further simplify by removing bubbles:

gfatools asm -b 1000 -u two_ecolis.gfa > two_ecolis.bu.gfa
gfatools asm -b 1000 -u two_ecolis.smooth.gfa > two_ecolis.smooth.bu.gfa

However, after discussions with Erik Garrison, it would be better to just run pggb instead of seqwish+smoothxg, to automatically tweaks the parameters of smoothxg.

Minigraph

Construct by aligning o157 on the K12 reference using minigraph:

../tools/minigraph -cxggs -t8 ../data/k12.fasta ../data/o157.fasta > o157_on_k12.gfa

Minimizer-space de Bruijn graphs

Construct with k=10, d=0.001:

../tools/rust-mdbg -k 10 -d 0.001 ../data/two_ecolis.fasta --reference --minabund 1

Compact the (minimizer-space) de Bruijn graph:

../tools/gfatools asm -u graph-k10-d0.001-l12.gfa > graph-k10-d0.001-l12.u.gfa

Reincorporate bases in mdBG:

../tools/to_basespace -g graph-k10-d0.001-l12.u.gfa -s graph-k10-d0.001-l12

Commands for demo

Change prompt:

export PS1="\[\e[0;36m\]pangenomics:\W\[\e[0m\]$ "