An accurate and sensitive bacterial plasmid identification tool based on deep machine-learning of shared k-mers and genomic features.
- Currently tested on CentOS 7 and Ubuntu 20.04, should be working on other Linux releases
- A minimum of 32GB system memory is required for kmer-db to load the databases
- The AVX instruction is required (required by kmer-db)
Please download and decompress our pre-built database.
The pre-built database are available at Zenodo and Google Drive.
The link contains two file, plasmerMainDB.tar.xz
and customizedKraken2DB.tar.xz
.
Check the sha1sum:
$ sha1sum plasmerMainDB.tar.xz
0b08f5c30d60b137f54de6024ab7557031850db6 plasmerMainDB.tar.xz
$ sha1sum customizedKraken2DB.tar.xz
b14efdd9232fd5f6d066716bd8e3e6ca80c9c0de customizedKraken2DB.tar.xz
Extract the contents into the same directory, and provide the absolute path of the directory to the -d parameter on the command line.
We recommend run Plasmer with Docker, with Docker you do not need to figure out how to install Plasmer. However, run Plasmer in shell directly on Linux is also feasible.
You can simply install Plasmer using conda:
conda install -c iskoldt -c bioconda -c conda-forge -c defaults plasmer
If you do not use conda, here is the tutorial for you to install Plasmer from scratch:
Be sure you installed all the required dependencies first, the required dependencies:
seqkit 2.2.0
python 3.10.4 (gzip; os; sys; Bio)
perl v5.26.2
kmer-db 1.9.4
Prodigal V2.6.3
HMMER 3.3.2
BLAST 2.10.1+
INFERNAL 1.1.4
diamond v2.0.8.146
GNU Parallel 20220722
Kraken version 2.1.2
R version 4.2.0 (hash; randomForest 4.7-1.1)
Then download Plasmer from GitHub:
git clone https://github.com/nekokoe/Plasmer.git
cd Plasmer
export PATH=$PATH:$(pwd)
Add the current directory to your PATH environment variable permanently:
echo 'export PATH=$PATH:'$(pwd) >> ~/.bashrc && source ~/.bashrc
Plasmer -g input_fasta -p out_prefix -d db -t threads -m minimum_length -l length -o outpath
The parameters:
-h --help Print the help info and exit.
-v --version Print the version info.
-g --genome The input fasta. [required]
-p --prefix The prefix for intermediate files and results. [Default: output]
-d --db The path of pre-built Plasmer databases. [required]
-t --threads Number of threads. [Default: 8]
-m --minimum_length The minimum length(bp) of sequences, the sequences shorter than the length will be dropped. [Default: 500]
-l --length The length(bp) threshold of sequences as chromosome to filtered. If set 0, no sequence are filtered, all sequences will be predicted. [Default: 500000]
-o --outpath The outpath. [required]
With docker, you don't have to install any of the dependencies. See more about Docker
Download the Docker image first:
docker pull nekokoe/plasmer:latest
Assuming the input FASTA file was deposited in {inputfilepath}
/input.fasta
Run the following command to get result in {outputfilepath}
You can replace input.fasta
with the actual name of your file.
docker run -d --rm --name plasmer \
-v {inputfilepath}:/input \
-v {outputfilepath}:/output \
-v {databasepath}:/db \
nekokoe/plasmer:latest \
/bin/sh /scripts/Plasmer \
-g /input/input.fasta \
-p {prefix} \
-d /db \
-t {threadnumber} \
-m 500 \
-l 500000 \
-o /output
Replace with your own input:
{inputfilepath}
: Absolute path contains input.fasta
in your file system
{outputfilepath}
: Absolute path for output in your file system
{databasepath}
: Absolute path for the downloaded pre-built Plasmer database
{prefix}
: Prefix for intermediate and output files
{threadnumber}
: Number of CPUs wish to use
We also provide a bash shell script that runs the Docker for you, if you have many input files in a directory.
bash dockerrun_batch.sh /input/files/path /output/files/path /database/path CPU_threads minimum_length length
In the outpath/results, 5 files are generated, including:
prefix.plasmer.predProb.tsv
prefix.plasmer.predClass.tsv
prefix.plasmer.predPlasmids.taxon
prefix.plasmer.predPlasmids.fa
prefix.plasmer.shorterM.fasta
Have a look at result_example
folder of the Github repository:
The example.plasmer.predProb.tsv
: The probability of each contig classified to chromosome and plasmid.
Contig | chromosome | plasmid |
---|---|---|
contig_1 | 0.832 | 0.168 |
contig_2 | 0.952 | 0.048 |
contig_3 | 0.022 | 0.978 |
contig_4 | 0.984 | 0.016 |
contig_5 | 0 | 1 |
contig_6 | 0 | 1 |
contig_7 | 0.906 | 0.094 |
contig_8 | 0 | 1 |
contig_9 | 0.84 | 0.16 |
contig_10 | 0 | 1 |
The example.plasmer.predClass.tsv
: The class of each contig.
Contig | Type |
---|---|
contig_1 | chromosome |
contig_2 | chromosome |
contig_3 | plasmid |
contig_4 | chromosome |
contig_5 | plasmid |
contig_6 | plasmid |
contig_7 | chromosome |
contig_8 | plasmid |
contig_9 | chromosome |
contig_10 | plasmid |
The example.plasmer.predPlasmids.taxon
: The taxonomy of each predicted plasmid contig.
Contig | Taxonomy ID |
---|---|
contig_1 | Enterococcus faecium (taxid 1352) |
contig_2 | Enterococcus faecium (taxid 1352) |
contig_3 | Enterococcus faecium (taxid 1352) |
contig_4 | Enterococcus faecium (taxid 1352) |
contig_5 | Enterococcus faecium (taxid 1352) |
contig_6 | Enterococcus faecium Aus0085 (taxid 1305849) |
contig_7 | Enterococcus faecium (taxid 1352) |
contig_8 | Enterococcus faecium (taxid 1352) |
contig_9 | Enterococcus faecium (taxid 1352) |
contig_10 | Enterococcus faecium (taxid 1352) |
The example.plasmer.predPlasmids.fa
: The sequences of predicted plasmid contigs.
The prefix.plasmer.shorterM.fasta
contains the sequences filtered out by the -m parameter.
Download the results of other tools from Zenodo or Google Drive.
Your feedback, bug-report and suggestions are welcomed to nekokoe (at) qq.com and husn (at) im.ac.cn
This project is licensed under the terms of the MIT license.