pangolin

Phylogenetic Assignment of Named Global Outbreak LINeages

Full pangolin documentation found at cov-lineages.org

Find the pangolin web application here, thanks to the Centre for Genomic Pathogen and Surveillance!

Requirements

Pangolin runs on MacOS and Linux. The conda environment recipe may not build on Windows (I haven't tested it) but can be run using the Windows subsystem for Linux.

Some version of conda, we use Miniconda3. Can be downloaded from here
Your query fasta file

Install pangolin

Clone this repository and cd pangolin
conda env create -f environment.yml
conda activate pangolin
python setup.py install
That's it

Troubleshooting install see the pangolin wiki

Note: we recommend using pangolin in the conda environment specified in the environment.yml file as per the instructions above. If you can't use conda for some reason, bear in mind the data files are hosted in two separate repositories at

cov-lineages/lineages
cov-lineages/pangoLEARN
you will need to pip install them alongside the other dependencies for pangolin (details found in environment.yml).

Check the install worked

Type (in the pangolin environment):

pangolin -v
pangolin -pv

and you should see the versions of pangolin, and pangoLEARN data release printed respectively.

Updating pangolin

Note: Even if you have previously installed pangolin, as it is being worked on intensively, we recommend you check for updates before running.

To update pangolin and pangoLEARN automatically to the latest stable release:

conda activate pangolin
pangolin --update

If extra dependencies are introduced (for major releases) the full environment will need to be updated as below:

Alternatively, this can be done manually:

conda activate pangolin
git pull
pulls the latest changes from github
python setup.py install
re-installs pangolin.
conda env update -f environment.yml
updates the conda environment (you're unlikely to need to do this, but just in case!)
pip install git+https://github.com/cov-lineages/pangoLEARN.git --upgrade
updates if there is a new data release

Updating from pangolin v1.0 to pangolin v2.0

If invoking data path (-d), changed to pangoLEARN instead of lineages

-d /home/vix/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangoLEARN/data

The columns in the output file has also changed, unless running --legacy

No longer UFBootstrap, aLRT or lineages_version
New fields: probability and pangoLEARN_version

Basic usage

Activate the environment conda activate pangolin
Run pangolin <query>, where <query> is the name of your input file.

Output

Your output will be a csv file with taxon name and lineage assigned, one line corresponding to each sequence in the fasta file provided

Example:

Taxon	Lineage	support	pangoLEARN_version	status	note
Virus1	B.1	80	2020-04-27	passed_qc
Virus2	A.1	65	2020-04-27	passed_qc
Virus3	A.3	100	2020-04-27	passed_qc
Virus4	B.1.4	82	2020-04-27	passed_qc
Virus5	None	0	2020-04-27	fail	N_content:0.80
Virus6	None	0	2020-04-27	fail	seq_len:0
Virus7	None	0	2020-04-27	fail	failed to map

Citing pangolin

There is a publication in prep for pangolin, but in the meantime please to link to this github github.com/cov-lineages/pangolin if you have used pangolin in your research.

References

The following external software is run as part of pangolin:

minimap2

Heng Li, Minimap2: pairwise alignment for nucleotide sequences, Bioinformatics, Volume 34, Issue 18, 15 September 2018, Pages 3094–3100, https://doi.org/10.1093/bioinformatics/bty191

snakemake

Köster, Johannes and Rahmann, Sven. “Snakemake - A scalable bioinformatics workflow engine”. Bioinformatics 2012.

ArtPoon/pangolin