Phylogenetic tree inference and heatmap drawing from ANI (average nucleotide identity) or AAI (average aminoacid identity)-derived genomic distances.
genomic_distance_viz.py [-h] (-l LOW_TRIANGULAR_MATRIX | -t ANI_TABLE | --anirb | --aairb | --mummer | --fastani) [--input-list INPUT_LIST | --input-dir INPUT_DIR] [-d] [-x EXTENSION] [-p PREFIX] [-m {UPGMA,NJ,both,none}] [--threads THREADS] [-H] [-A] [-D] [--cluster-threshold CLUSTER_THRESHOLD] [--reroot]
-h, --help
Show this help message and exit
-l FILE, --low-triangular-matrix FILE
Low triangular matrix of ANI values
-t FILE, --table FILE
Tab separated table of similarity (ANI or AAI) between genomes
--anirb
Calculate ANI with ani.rb (should be installed separately) (slow). --input-list/--input-dir required
--fastani
Calculate ANI with fastANI (should be installed separately) (fast). --input_list/--input_dir required
--mummer
Calculate ANI with mummer. --input_list/--input_dir required
--input-list FILE
File with a list of full paths of genomes for ANI calculation
--input-dir DIRECTORY
Path (may be relative) to directory containig genomes for ANI calculation
-x EXTENSION, --extension EXTENSION
Fasta files extension, e.g. fna (default), fa, fasta
-d, --use-diamond
Use diamond instead of BLAST in aai.rb (faster)
-p PREFIX, --prefix PREFIX
Prefix for output files
-m {UPGMA,NJ,both,none}, --tree-method {UPGMA,NJ,both,none}
Phylogenetic tree inference method (default UPGMA)
-H, --heatmap
Draw a heatmap
-A, --ascii-tree
Draw ASCII tree to stdout
-D, --plot-dendrogram
Plot a dendrogram
-c, --print-clusters
Print genomic clusters to file
--cluster-threshold CLUSTER_THRESHOLD
Threshold for genomic clusters output
--reroot
Reroot tree at midpoint. May cause errors or incorrect trees
--threads THREADS
Number of CPU threads (where possible)
--checkm-file CHECKM_FILE
Checkm output file to select best representative genome in cluster.
Low triangular matrix - matrix produced by fastANI with "--matrix" option or any other software.
ANI table - tab-separated file of such structure:
Genome1 Genome2 Identity[ ...]
May be produced by e.g. ani.rb:
for i in *fna; do for j in *fna; do echo -ne "$i\t$j\t"; ani.rb -q -a -1 $i -2 $j 2>/dev/null; done; done >> ani.rb.tsv
Or by fastANI.
If you have ani.rb, aai.rb, mummer4 or fastANI installed in Your environment you may use corresponding key to calculate genome identity for list of genomes using --input_list or folder containing genomes --input_dir. Genomes should have FASTA format, You may provide an extension using --extension option.
- dendropy
- matplotlib
- numpy
- scipy
May be installed with pip:
pip install dendropy matplotlib numpy scipy
or conda
conda install dendropy matplotlib numpy scipy
Optional:
AssertionError during midpoint rooting
File "/somewhere/site-packages/dendropy/datamodel/treemodel.py", line 5076, in reroot_at_midpoint assert break_on_node is not None or target_edge is not None AssertionError
Basic genome statistics retreived with BioPython.
genomic_stats.py [-h] -i INPUT [-o OUTPUT_PREFIX] [-w]
[-f {human,json,table}] [-X ADDITIONAL_METRIC]
[-s SQLITE_DB] [-e EXTENSION] [-b]
Arguments:
-h, --help
show help message and exit
-i INPUT, --input INPUT
Input file in FASTA format or directory.
-o OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
Output prefix. Skip it to print to stdout. If file already exists output will be appended.
-w, --overwrite
Overwrite output file.
-f {human,json,table}, --format {human,json,table}
Output format: human (human-fiendly, default), json, table (tab-delimited).
-X ADDITIONAL_METRIC, --additional-metric ADDITIONAL_METRIC
Additional metric to calculate Nx and Lx. Integer between 1 and 99.
-s SQLITE_DB, --sqlite-db SQLITE_DB
Path to sqlite database to store data.
-e EXTENSION, --extension EXTENSION
File extension if directory was provided. By default any file will be processed.
-b, --basename-as-prefix
Ouput file prefix will be taken from input file or directory.
- Biopython
May be installed with pip:
pip install biopython
or conda
conda install biopython