
Phyrvm: A one-stop software solution for RNA virus mining and host-association prediction

Phyrvm automates RNA virus detection in three stages: RNA virus discovery, phylogenetic analysis, and phylogeny-based virus characterization. It outputs viral sequences, phylogenetic trees, and detailed reports, offering flexibility and accuracy in identifying putative viral sequences and their host associations.


  • python>=3.8
  • R 4.2

Step 1: Install conda and third-party dependencies

Phyrvm requires third-party packages from the conda-forge and bioconda channels

conda install -c bioconda blast bbmap seqkit  mafft megahit trimal  pplacer  taxonkit  bowtie2 cd-hit
conda install taxonkit diamond==2.0.15  bowtie2 samtools==1.16.1
pip install Bio biopython DendroPy  matplotlib    numpy   pandas regex seaborn  tqdm


  • Version of the tool available for reference:

    • bbduk.sh:bbmap v39.01 ; Seqkit v2.4.0 ;bowtie2 v2.5.1 ;megahit v1.2.9
    • mafft v7.520 ;trimal v1.4.1 ;makeblastdb,blastn,blastp v2.13.0+
    • samtools v1.16.1 ;diamond v2.0.15
  • The taxonkit dataset should also be downloaded!

Step 2: Install Phyrvm via pip

All python packages will be downloaded automatically!

pip install phyrvm

Step 3: Install R and R package

#install R package
if (!requireNamespace("BiocManager", quietly = TRUE))
ipak <- function(pkg){
    new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
    if (length(new.pkg))  
    sapply(pkg, require, character.only = TRUE)

Notes: tidyverse is based on systemfonts and you may need the following code to install it

conda install r-systemfonts

Step 4: Downloading and configuring the database.

Phyrvm requires an environment variable named PHYRVM_DB_PATH, this is the parent directory for the following databases.

See below for specific database configurations.

#set PHYRVM_DB_PATH to environment variable
export PHYRVM_DB_PATH=/path/to/the/database/


  • The databases take up a lot of space, so make sure you have enough disk space. If you already have these databases, you can skip the download step and just configure them.

  • The download speed of the database depends on the internet. You can also choose other download methods such as ascp.


  • 1.1 Download the file from this link.

  • 1.2 Unzip the file and Using bowtie2 to build the index.

    bunzip2 -cv Phyrvm_rRNA_db.fasta.bz2 > PHYRVM_DB_PATH/rRNA/Phyrvm_rRNA_db.fasta
    bowtie2-build PHYRVM_DB_PATH/rRNA/Phyrvm_rRNA_db.fasta PHYRVM_DB_PATH/rRNA/rRNA_cutout_ref


#Download the `PROT_ACC2TAXID` file
wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
wget -c https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5

#Check for the file integrity
md5sum -c prot.accession2taxid.gz.md5

#Unzip the files and onfiguration
gunzip -c prot.accession2taxid.gz > PHYRVM_DB_PATH/accession2taxid/prot.accession2taxid


phyrvm medthod [options]

  • medthod:
    • end_to_end
    • assembly_and_basic_annotation
    • phylogenetic_analysis


phyrvm end_to_end  -i 1.fastq -i2 2.fastq \
    -out_dir out_path  --threads 60 -classify_model All --keep-dup
phyrvm assembly_and_basic_annotation -i 1.fastq -i2 2.fastq \
    -out_dir out_path  --threads 60 
phyrvm phylogenetic_analysis -classify_i test/test_contig.fasta   \
	-out_dir out_path   -classify_model All  --threads 90 --keep-dup