PGCGAP - the Prokaryotic Genomics and Comparative Genomics Analysis Pipeline

English Readme | Chinese Readme

	  ____       ____      ____     ____       _        ____    
	U|  _"\ u U /"___|u U /"___| U /"___|u U  /"\  u  U|  _"\ u 
	\| |_) |/ \| |  _ / \| | u   \| |  _ /  \/ _ \/   \| |_) |/ 
	 |  __/    | |_| |   | |/__   | |_| |   / ___ \    |  __/   
	 |_|        \____|    \____|   \____|  /_/   \_\   |_|      
	 ||>>_      _)(|_    _// \\    _)(|_    \\    >>   ||>>_    
	(__)__)    (__)__)  (__)(__)  (__)__)  (__)  (__) (__)__)

Introduction
Installation
Required dependencies
License
Feedback and Issues
Citation
Usages

Introduction

PGCGAP is a pipeline for prokaryotic comparative genomics analysis. It can take the pair-end reads as input. In addition to genome assembly, gene prediction and annotation, it can also get common comparative genomics analysis results such as phylogenetic trees of single-core proteins and core SNPs, pan-genome, whole-genome Average Nucleotide Identity (ANI), orthogroups and orthologs, COG annotations, substitutions (snps) and insertions/deletions (indels) and antimicrobial and virulence genes mining with only one line of commands.

Installation

The software was tested successfully on Windows WSL, Linux x64 platform and macOS. Because this software relies on a large number of other softwares, so it is recommended to install with Bioconda.

Step1: Install PGCGAP

$conda create -n pgcgap python=3
$conda activate pgcgap
$conda install pgcgap

Notice: What should we do when the installation is slow? As more and more software is contained in CONDA and the index files become larger, the search space for the software that satisfies all the software dependencies in the environment becomes larger and larger when installing a new software, making "Solving Environment" slower and slower. Sometimes we can't even install the software through CONDA. In fact, we can do something instead of just waiting.

Method 1: use mamba to deal with the slow development of "solving environment" when using CONDA.
```
  $conda activate pgcgap
  $conda install mamba -c conda-forge
  $mamba install pgcgap
  
```

Method 2: use "environment.yaml" we provided to deal with the slow development of "solving environment" when using CONDA. Run the following commands to download the latest environmental file and install PGCGAP:

  # download pgcgap_latest_env.yml
  $wget https://github.com/liaochenlanruo/pgcgap/blob/master/conda/pgcgap_latest_env.yml
  
  # create a conda environment named as pgcgap and install the latest version of PGCGAP
  $conda env create -f pgcgap_latest_env.yml

Step2: Setup COG database (Users should execute this after first installation of pgcgap)

$conda activate pgcgap
$pgcgap --setup-COGdb
$conda deactivate

Users with docker container installed have another choice to install PGCGAP.

$docker pull quay.io/biocontainers/pgcgap:<tag>

(see pgcgap/tags for valid values for <tag>)

Required dependencies

Abricate
ABySS
Canu
CD-HIT
Coreutils
Diamond
FastANI
Fastme
Fastp
Gblocks
Gubbins >=2.3.4
Htslib
IQ-TREE
Mafft
Mash
Mmseqs2
Muscle
NCBI-blast+
OrthoFinder
OpenJDK8
PAL2NAL v14
Perl & the modules
- perl-bioperl
- perl-data-dumper
- perl-file-tee
- perl-getopt-long
- perl-pod-usage
- perl-parallel-forkmanager
Prokka
Python & the modules
- biopython
- matplotlib
- numpy
- pandas
- seaborn
R & the packages
- corrplot
- ggplot2
- gplots
- pheatmap
- plotrix
Roary
Sickle-trim
Snippy
Snp-sites
trimAL
unicycler
wget

License

PGCGAP is free software, licensed under GPLv3.

Feedback and Issues

Please report any issues to the issues page or email us at liaochenlanruo@webmail.hzau.edu.cn.

Citation

If you use this software please cite: Liu H, Xin B, Zheng J, Zhong H, Yu Y, Peng D, Sun M. Build a bioinformatics analysis platform and apply it to routine analysis of microbial genomics and comparative genomics. Protocol exchange, 2021. DOI: 10.21203/rs.2.21224/v5

Usages

For more detial informations, please visit the webpage of PGCGAP and WIKI.

stogqy/pgcgap