Lecture given at the CGSI, UCLA, July 22nd 2022.
Slides: https://docs.google.com/presentation/d/1KBckpDnKlDZpvRktt_RSAxUCTcOA9n83ERkDqNo3JKs/edit?usp=sharing
Video: https://www.youtube.com/watch?v=ve07C4hY94Y
We'll show how some common pangenome graphs can be constructed in practice, on a simple example.
Data is two E. coli genomes, present in the data/
folder.
Create compacted dBG with k=31 using bcalm2:
../tools/bcalm -in ../data/two_ecolis.fasta -kmer-size 31 -abundance-min 1
Convert bcalm2's FASTA (with edge information) to GFA:
../tools/convertToGFA.py two_ecolis.unitigs.fa two_ecolis.unitigs.gfa 31
Simplify graph by removing small bubbles using gfatools:
../tools/gfatools asm -b 100 -u two_ecolis.unitigs.gfa > two_ecolis.unitigs.bu.gfa
Trying a larger k value (k=300):
../tools/bcalm-k320 -in ../data/two_ecolis.fasta -kmer-size 300 -abundance-min 1
Create a raw pangenome graph using minimap2 + seqwish:
minimap2 -c -X ../data/two_ecolis.fasta ../data/two_ecolis.fasta > two_ecolis.paf
../tools/seqwish -s ../data/two_ecolis.fasta -p two_ecolis.paf -g two_ecolis.gfa
Simplify using smoothxg (Takes a while!):
./tools/smoothxg -g two_ecolis.gfa -o two_ecolis.smooth.gfa
Further simplify by removing bubbles:
gfatools asm -b 1000 -u two_ecolis.gfa > two_ecolis.bu.gfa
gfatools asm -b 1000 -u two_ecolis.smooth.gfa > two_ecolis.smooth.bu.gfa
However, after discussions with Erik Garrison, it would be better to just run pggb
instead of seqwish
+smoothxg
, to automatically tweaks the parameters of smoothxg
.
Construct by aligning o157 on the K12 reference using minigraph:
../tools/minigraph -cxggs -t8 ../data/k12.fasta ../data/o157.fasta > o157_on_k12.gfa
Construct with k=10, d=0.001:
../tools/rust-mdbg -k 10 -d 0.001 ../data/two_ecolis.fasta --reference --minabund 1
Compact the (minimizer-space) de Bruijn graph:
../tools/gfatools asm -u graph-k10-d0.001-l12.gfa > graph-k10-d0.001-l12.u.gfa
Reincorporate bases in mdBG:
../tools/to_basespace -g graph-k10-d0.001-l12.u.gfa -s graph-k10-d0.001-l12
Change prompt:
export PS1="\[\e[0;36m\]pangenomics:\W\[\e[0m\]$ "