A web-based, interactive pangenome visualization tool
A pangenome represents genetic variation in a population as a variation graph, which greatly reduces the reference
bias that comes from using a linear reference genome. However, a linear reference genome is more intuitive to
understand and has been the traditional way that bioinformaticians use. As an effort to make it easier to visualize
and interpret variation graphs, I present pgv
, an interactive visualization tool built on top of previous work
that aims to display structural variants in a variation graph interactively.
Instead of fitting the nodes, edges, and paths of a variation graph in a 2-dimensional space, pgv
draws the sequence
graph itself on the x-y plane, and paths are rendered as separate layers that can be interactively selected/highlighted.
- The paths in a variation grpah can be interactively selected/highlighted:
A live demo is available at: https://w-gao.github.io/pgv.
If you want to try pgv
yourself, you can get started in the following ways. Keep in mind that this project is under
development and may not work well with your own data. If you encounter any issues, please let me know by
opening an issue.
The easiest way to run pgv
is through Docker:
$ docker pull wlgao/pgv:latest
To run a container:
$ docker run -d --name pgv \
-v "$(pwd)/examples":/pgv/ui/examples \
-p 8000:8000 \
wlgao/pgv:latest
This creates a container in detach (-d
) mode, exposes port 8000
, with a volume for the graph files.
You can add additional volumes if you want to construct your own graphs inside the container, or, if you have
vg
installed on your local system, you can use the pgv CLI to
pre-process graphs.
If successful, pgv
should be running at:
http://localhost:8000
To stop the container, run:
docker stop pgv
Alternatively, you can pull the source code and build the project yourself. You would need the following minimum requirements:
- Node.js >= 16
- Yarn < 2
- Python >= 3.8
To clone the repo:
get clone https://github.com/w-gao/pgv.git
cd pgv
Then, build the project:
# Run the prebuild script.
# Note: this script requires curl.
./prebuild.sh
# Build core package.
yarn core:build
# Build web package.
yarn web:build
# Start a preview HTTP server.
yarn web:preview
If successful, pgv
should be running at:
http://localhost:8000
When you select a graph, you are presented with this interface:
This is split into three sections:
- The header for graph selection and navigation
- The sequenceTubeMap render of the graph
- The
pgv
render of the graph
- To navigate the graph, you can use the
←
and→
buttons in the header, or theA
andD
keys. - To cycle between the paths, you can use the
↑
and↓
buttons, or theup
anddown
arrow keys. - You can also move closer or away from the graph using the
W
andS
keys, up and down using theR
andF
keys. - (controls such as movement speed are limited at the moment, but can be easily added if needed)
If you want to use your own data, you need to pre-process the files first. This can be done by the pgv CLI.
Currently supported file formats are: FASTA (.fa
), VCF (.vcf
, .vcf.gz
), and GBWT (.gbwt
).
The following assumes that you have vg
installed. You also need cli.py
,
which is pre-installed inside the pgv container, or can be downloaded via:
curl -O https://raw.githubusercontent.com/w-gao/pgv/main/cli.py
For example, to construct a graph from a FASTA file (x.fa
) and a VCF file (x.vcf.gz
), run:
$ python3 cli.py add example \
--reference x.fa \
--vcf x.vcf.gz
This will pre-process the input files and output the results to the folder set by --dest
, which is ./examples/
by
default. When running locally, the dev server symlinks to this default folder to grab the graph files. When running via
Docker, the /pgv/ui/examples
path inside the container is statically served. If you decide to modify this path, make
sure to update the configuration for the respective places as well.
Graph genomes are often large, so you can specify a range to only display a small chunk of the graph. Additionally, you
can also specify a GBWT file to include additional haplotypes for the particular range. For example, to limit the above
graph to nodes from 1:100
, with more haplotypes from x.vg.gbwt
:
$ python3 cli.py add example \
--reference x.fa \
--vcf x.vcf.gz
--node-range 1:100 \
--gbwt-name x.vg.gbwt
If you have an existing graph but want to update the range, you can use the update
subcommand so it doesn't have to
re-construct and re-index the graph file:
$ python3 cli.py update example \
--node-range 1:100 \
--gbwt-name x.vg.gbwt
For more information on the CLI, run:
$ python3 cli.py --help
Copyright (c) 2023 William Gao. MIT license.