/Phyrvm

Phyrvm: A one-stop software solution for RNA virus mining and host-association prediction

Primary LanguagePythonMIT LicenseMIT

Phyrvm

python3.8

Model

Image
Phyrvm automates RNA virus detection in three stages: RNA virus discovery, phylogenetic analysis, and phylogeny-based virus characterization. It outputs viral sequences, phylogenetic trees, and detailed reports, offering flexibility and accuracy in identifying putative viral sequences and their host associations.

Installation

  • python>=3.8
  • R 4.2

Step 1: Install conda and third-party dependencies

Phyrvm requires third-party packages from the conda-forge and bioconda channels

conda install -c bioconda blast bbmap seqkit  mafft megahit trimal  pplacer  taxonkit  bowtie2 cd-hit
conda install taxonkit diamond==2.0.15  bowtie2 samtools==1.16.1
pip install Bio biopython DendroPy  matplotlib    numpy   pandas regex seaborn  tqdm

Notes:

  • Version of the tool available for reference:

    • bbduk.sh:bbmap v39.01 ; Seqkit v2.4.0 ;bowtie2 v2.5.1 ;megahit v1.2.9
    • mafft v7.520 ;trimal v1.4.1 ;makeblastdb,blastn,blastp v2.13.0+
    • samtools v1.16.1 ;diamond v2.0.15
  • The taxonkit dataset should also be downloaded!

Step 2: Install Phyrvm via pip

All python packages will be downloaded automatically!

pip install phyrvm

Step 3: Install R and R package

#install R package
R
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("ggtree")
packages=c("tidyverse""ggplot2","RColorBrewer","phangorn","networkD3","jsonlite","dplyr","networkD3","jsonlite")
ipak <- function(pkg){
    new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
    if (length(new.pkg))  
        install.packages(new.pkg)
    sapply(pkg, require, character.only = TRUE)
}
ipak(packages)

Notes: tidyverse is based on systemfonts and you may need the following code to install it

conda install r-systemfonts

Step 4: Downloading and configuring the database.

Phyrvm requires an environment variable named PHYRVM_DB_PATH, this is the parent directory for the following databases.

See below for specific database configurations.

#set PHYRVM_DB_PATH to environment variable
export PHYRVM_DB_PATH=/path/to/the/database/

Notes:

  • The databases take up a lot of space, so make sure you have enough disk space. If you already have these databases, you can skip the download step and just configure them.

  • The download speed of the database depends on the internet. You can also choose other download methods such as ascp.

rRNA

  • 1.1 Download the file from this link.

  • 1.2 Unzip the file and Using bowtie2 to build the index.

    bunzip2 -cv Phyrvm_rRNA_db.fasta.bz2 > PHYRVM_DB_PATH/rRNA/Phyrvm_rRNA_db.fasta
    bowtie2-build PHYRVM_DB_PATH/rRNA/Phyrvm_rRNA_db.fasta PHYRVM_DB_PATH/rRNA/rRNA_cutout_ref

PROT_ACC2TAXID

#Download the `PROT_ACC2TAXID` file
wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5

#Check for the file integrity
md5sum -c prot.accession2taxid.gz.md5

#Unzip the files and onfiguration
gunzip -c prot.accession2taxid.gz > PHYRVM_DB_PATH/accession2taxid/prot.accession2taxid

Usage

phyrvm medthod [options]

  • medthod:
    • end_to_end
    • assembly_and_basic_annotation
    • phylogenetic_analysis

Example

phyrvm end_to_end  -i 1.fastq -i2 2.fastq \
    -out_dir out_path  --threads 60 -classify_model All --keep-dup
	
phyrvm assembly_and_basic_annotation -i 1.fastq -i2 2.fastq \
    -out_dir out_path  --threads 60 
	
phyrvm phylogenetic_analysis -classify_i test/test_contig.fasta   \
	-out_dir out_path   -classify_model All  --threads 90 --keep-dup