VirSorter
This is not the Source code of the VirSorter App, (which is available on CyVerse). This is a forked version of the VirSorter repository merely cleaned up a touch to run easier outside of Docker. If you would like to check out the real VirSorter App simply head over to Big Simon's repo.
The inspiration for me to fork this repository was to inforporate it into the Baby Virome pipeline, a lightweight and (somewhat) scaleable virome (viral metagenome) analysis pipeline.
The only modifications you'll see in this repository are meant to help the VirSorter code base improve in running time and commandline documentation (I hope). Oh, and to remove all of the Docker-related features and documentation. It's really difficult to run Docker on Linux systems at your instutitution or company because they won't dish out those sweet sweet sudo privileges (even if you have a PhD). And yeah you can run Docker without sudo, but good luck getting your IT department on board with that.
Publication
- VirSorter: mining viral signal from microbial genomic data
- https://peerj.com/articles/985/
- PubMed 26038737
Result files
The main output files of VirSorter are:
File | Description |
---|---|
VIRSorter_global-phage-signal.csv | Comma-separated table listing the viral predictions from VirSorter (one row per prediction). |
Metrics_files/VIRSorter_affi-contigs.tab | Pipe-delimited table listing the annotation of all predicted ORFs in all contigs. More details below. |
Predicted_viral_sequences/ | FASTA and Genbank files of predicted viral sequences. |
Fasta_files/ | Intermediary files, including predicted proteins. |
Tab_files/ | Intermediary files, including results of the search agasint PFAM and the virus database. |
More details on VIRSorter_affi-contigs.tab file: Lines starting with a ">" are "headers", i.e. information about the contig (contig name, number of genes, "c" for circular or "l" for linear). All other lines are information about the genes, with different columns as follows: Gene name, start, stop, length, strand, Hit in the virus protein cluster database, hit score, hit e-value, category of the virus protein cluster (see below), Hit in PFAM, hit score, hit e-value.
The categories of virus clusters represent the range of genomes in which this virus cluster was detected, i.e. 0: hallmark genes found in Caudovirales, 1: non-hallmark gene found in Caudovirales, 2: non-hallmarke gene found exclusively in virome(s), 3: hallmark gene not found in Caudovirales, 4: non-hallmark gene not found in Caudovirales.
Dependencies
Check out the INSTALL.md file.
Data Container
The 12G of dependent data exists as a separate data container called "virsorter-data."
This is the Dockerfile for that:
FROM perl:latest
MAINTAINER Ken Youens-Clark <kyclark@email.arizona.edu>
COPY Generic_ref_file.refs /data/
COPY PFAM_27 /data/PFAM_27
COPY Phage_gene_catalog /data/Phage_gene_catalog
COPY Phage_gene_catalog_plus_viromes /data/Phage_gene_catalog_plus_viromes
COPY VirSorter_Readme.txt /data
COPY VirSorter_Readme_viromes.txt /data
VOLUME ["/data"]
Then do:
$ docker build -t kyclark/virsorter-data .
$ docker create --name virsorter-data kyclark/virsorter-data /bin/true
Authors
Simon Roux roux.8@osu.edu is the author of Virsorter
Rev DJN 26Jan2018