The PGPT-Pred tool is part of the web resource PLaBAse (https://plabase.cs.uni-tuebingen.de/pb/plabase.php) and allows annotation of bacterial plant growth-promoting traits (proteins), short "PGPTs" of single genomes, using blastp+hmmer or IMG-KEGG-annoation Mapper against the PGPT ontology.
When applying PGPT-Pred via the PLaBAse always cite the respective reference:
Patz S, Gautam A, Becker M, Ruppel S, RodrĂguez-Palenzuela P, Huson DH. PLaBAse: A comprehensive web resource for analyzing the plant growth-promoting potential of plant-associated bacteria. bioRxiv 2021, https://doi.org/10.1101/2021.12.13.472471 (preprint)
- PLaBAse v.1.01 - Jan 2022
- Copyright (C) 2020-2022 Sascha Patz.
NOTE:
This web platform and all its services/tools (e.g., PGPT-db, PGPT-Pred and PIFAR-Pred) come with ABSOLUTELY NO WARRANTY!
The code given here is implemented in the backend of PLaBAse and its PGPT-Pred tool!
Here, we we provide only the ontology and annotation exemplarily, for nitrogenase encoding genes and proteins!
The PGPT-Pred tool is only functional with its entire features and complete PGPT ontology when applied via the online platform (see link above)!*
- Interpreters
- python version 3.7
- Python packages (pyfasta,concurrent.futures.thread,ete3)
- R version 4.0.2 (2020-06-22)
- R libraries (phyloseq,https://raw.githubusercontent.com/mahendra-mariadassou/phyloseq-extended/master/R/load-extra-functions.R, psadd, argparse, stringr)
- Tools
- blastp (ncbi-blast-2.10.1)
- hmmer version 3.3
- faSplit (current version in path ksrc/src/utils/faSplit after download jksrc folder from : http://hgdownload.cse.ucsc.edu/admin/jksrc.zip)
- KronaTools version 2.8:(https://github.com/marbl/Krona/wiki)
- Data
- PGPT-ontology and respective blastp database (via PLaBAse web resource) version 1.01 (Date: Dec2021):
- An example is currently located in the source and factors folder, but entire ontology will be applicable via API in future (under developement)
- PGPT-ontology (nitrogenase, blastp+hmmer):
factors/PlantGrowthPromotingTraits.csv
- PGPT-protein blastp database (nitrogenase, blastp+hmmer):
factors/PGPT-blastpdb/nitrogenasePGPT
- PGPT-ontology (nitrogenase, IMG-KEGG-annoation Mapper):
src/pgpt_class_kegg.txt
- blastp+hmmer: genomic protein sequences in FASTA format (sorted by genomic location)
- IMG-KEGG-annoation Mapper: genomic protein KEGG annotations (received by IMG Server, or customer format)
KEGG-Costumer formats:
- Please have a look into the Manual for possible KEGG annotation formats, that are accepted: https://plabase.cs.uni-tuebingen.de/pb/manual.php
Accepted file extensions:
- fasta, fas, fas_aa, faa, txt (genomic proteins, received by NCBI, RAST, IMG, Uniprot, ...)
- txt (genomic proteins KEGG annotations, as tab separated file)
pgpt_blhm.py -h
pgpt_comp_fun_ascii_v2.py -h
- blastp+hmmer: genomic protein sequences are aligned against proteins associated with the PGPT ontology and respective PFAM domain comparison is achieved by hmmer against the PFAM domains using
pgpt_blhm.py
- IMG-KEGG-annoation Mapper: KEGG annotations (one per protein only) are mapped against the PGPT ontology, using
pgpt_comp_fun_ascii_v2.py -of pfar_kegg
- if file format is incompatible try to use our parsers:
Scripts/python/
- based on blastp+hmmer results or all blast hits (ignoring pfam comparison) of PGPTs, by applying: Web-Java-PlugIn
- based on KEGG-PGPT mapping, by applying: Web-Java-PlugIn
- based on blastp+hmmer results or all blast hits (ignoring pfam comparison) of PGPTs, by applying:
R/kronaScriptPGPT.r -m BH
- based on IMG-KEGG-PGPT mapping, by applying:
R/kronaScriptPGPT.r -m IMK
- Download: Summary file listing all blastp+hmmer or KEGG-mapped hits of PGPTs
- Pie Chart: Pie Chart summarizing either all blastp+hmmer hits or all blast hits (ignoring pfam comparison) or KEGG-PGPT hits in a percentage scale on ontology level 2
- Krona Plot: Krona Plot giving an hierachical overview of either all blastp+hmmer hits or all blast hits (ignoring pfam comparison) of PGPTs or KEGG-PGPT hits across all hierarchical levels
- blastp+hmmer: genome of 3000 genes/proteins approx. 30 minutes, genome of 6000 genes/proteins approx. 60 minutes, dependent on server workload
- IMG-KEGG-annoation Mapper: approx. 15 minutes, dependent on server workload
- see
Statistics/stat.r
- comprises 3 scripts for:
- Genome size, gene count and PGPT count analysis
- Scoary output-to-heatmap generation
- PCA plot computation
- can be run for all PGPT counts on the entire ontology levels, see internal script documentation
- Dependencies (libraries): dplyr, gplots, ggplot2, ggpubr, varhandle, RColorBrewer, FactoMineR, factoextra, corrplot, pca3d, vegan, ecodist, ape, cluster, devtools, ggbiplot