Even closely related bacterial genomes can differ in the presence of hundreds of genes and individual genes can be horizontally acquired from distant strains and species. This mix of inheritance patterns complicates phylogenetic analysis of bacteria.
Although several software packages are available for pan-genome analysis, yet visualization, interpretation and exploration of pan-genomes remains challenging. panX (Pan-genome Analysis and Exploration) aims at facilitating pan-genome research with an easy-to-use and interactive platform for analyzing and exploring pan-genomic data.
panX displays the pan-genome using interconnected visual components including gene cluster table, multiple alignment, comparative phylogenetic tree viewers and strain metadata table. The pan-genome data structures are prepared by our pan-genome-analysis analysis pipeline, which efficiently identifies orthologous clusters from large sets of genome sequences and pre-computes alignments, trees, and plenty of informative statistics. panX is available at pangenome.de
git clone https://github.com/neherlab/pan-genome-visualization
cd pan-genome-visualization
git submodule update --init
npm install
npm start
check http://localhost:8000/
in browser (One might need to upgrade the outdated browser.)
If npm can not start, please make sure that nodejs is updated.
The example page shows the beauty and power of the panX visualization and exploration, even though only a few species and some of their gene clusters are showcased in the repository. Complete pan-genomes are exhibited at pangenome.de.
based on post-vaccine epidemiology of 616 S. pneumoniae strains (Croucher et al. 2015)
Add your own page in the local server (add-new-pages-repo.sh)
Using standard layout (gene cluster table and alignment on the same row):
bash add-new-pages-repo.sh your_species
Using wide layout (gene cluster table for one row and alignment at the bottom):
bash add-new-pages-repo.sh your_species wide
After finishing the pan-genome-analysis pipeline, please use the script link-to-server.py to transfer your data to the local server.
./link-to-server.py -s E_coli -v /usr/pan-genome-visualization
Restart your server and enjoy your own interactive pan-genome dashboard!
Once BA and PA with metadata (e.g. drug concentration) are calculated in panX analysis pipeline, users can create their custom pages to flexibly visualize branch associations.
Example on how the page should be customized for BA and PA association:
add this line and corresponding file (modify "yourSpecies")
script(src='dataset/yourSpecies/newColumnConfig.js')
Example for newColumnConfig.js
panX analysis pipeline is based on DIAMOND, MCL and post-processing to determine clusters of orthologous genes from a collection of annotated genomes. panX generates a strain/species tree based on core genome SNPs and a gene tree for each gene cluster.
panX interactive visualization: (1) The dynamic pan-genome statistical charts allow rapid filtering and selection of gene subsets in cluster table;
clicking a gene cluster in cluster table loads (2) related alignment, (3) individual gene tree and (4) gene presence/absence and gain/loss pattern on strain/species tree;
(5) Selecting sequences in alignment highlights associated strains on strain/species tree;
(6) (7) Strain/species tree interacts with gene tree in various ways;
(8) Zooming into a clade on strain/species tree screens strains in metadata table;
(9) Searching in metadata table display strains pertinent to specific meta-information.
-
Is my dataset publicly available or does it remains private?
The panX visualization application can either be hosted on a web server or used locally to explore custom pan-genomes produced by the panX analysis pipeline. If you run it locally and access it via http://localhost:8000 , only you can see it. If you want to make your data publicly available, you can host it on a web server. -
How can I host panX visualization application on my own server?
Here is a tutorial for hosting and deploying Node.js application on Ubuntu: https://www.terlici.com/2015/02/05/hosting-deploying-nodejs.html -
How to use a different port than 8000?
Open the file ./bin/www, go to the linevar port = normalizePort(process.env.PORT || '8000');
and change'8000'
to'3000'
.
Then, check http://localhost:3000 -
I pushed the panX analysis result to the visualization repository, but can not find my own data?
After sending your data to the local server and adding your own page in the local server, one can access it via http://localhost:8000/my_species.
The visualization repository only shows the test datasets in the dropdown menu for "species selection". After changing the configuration file species-list-info.js, the new species can be added to the drop-down menu. -
Strange errors when using dots in the species/dataset name (e.g.: localhost:8000/E.coli)
Dots are often used to separate the parts of the domain name. So, please use "_" (underscore) for the species/dataset name within the url.
E.g.: http://localhost:8000/E_coli instead of http://localhost:8000/E.coli
Last updated: 2021-04-01
-
Put the app code into
/www/aws_pangenome
-
Install docker and AWS CLI
./scripts/install_docker.sh ./scripts/install_aws.sh
-
Put the data into
/www/aws_pangenome_dataset
For example, the data from previous deployment is on S3:
cd /www/aws_pangenome_dataset aws s3 sync s3://data.pangenome.ch-2021-04-01/ .
(you will need to be authenticated, for example with
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
env variables. You can create temporary ones, or, better, a temporary user on AWS. Make sure you clean up~/.bash_history
from those keys afterwards!)If there are issues with AWS CLI being kileld by OOM guard, reduce it's memory consumption by adding this to
~/.aws/config
:[profile default] s3 = max_concurrent_requests = 5 max_bandwidth = 10MB/s
-
Run the app
cd /www/aws_pangenome screen -qxRS pangenome -- ./scripts/run-attached.sh
-
Exit from
screen
session with Ctrl+A,Ctrl+D. It is safe to log out now. -
To reattach to screen session, run
screen -qxRS pangenome
again. -
Once containers are up, run certbot manually once, to create SSL certificates (inside nginx container)
docker exec -u 0 -it pangenome-nginx /certbot.sh
The renewal is automatic, every 12 days. See these files:
config/docker/nginx/files/etc/cron.d/jobs # cron job config/docker/nginx/files/certbot.sh # the script which cron job runs config/docker/nginx/files/etc/nginx/http.conf # nginx config to serve ACME challenge files config/docker/nginx/files/etc/nginx/https.conf # nginx config to proxy to the app
-
If something goes wrong and you want to start over, delete or move the
.volumes
directory as root (Warning: if deleted, all logs, certs and other persisted data will be lost!)