/ARGNet

Primary LanguagePythonMIT LicenseMIT

ARGNet

A deep nueral network for robust identification and annotation of antibiotic resistance genes.

The input can be long amino acid sequences(full length/contigs), long nucleotide sequences, short amino acid reads (30-50aa), short nucleotide reads (100-150nt) in fasta format. If your input is short reads you should assign 'argnet-s' model, or if your input is full-length/contigs you should assign 'argnet-l' to make the predict.

alt text

Installation

To install with git, run:

  git clone https://github.com/patience111/ARGNet

Requirements:

The programs were test with the following package version, you can install exactly the same version or other compatible versions.
Biopython: 1.79
tensorflow: 2.2.0
cuda: 10.2 (for GPU using)
cudnn: 7.6.5.32 (for GPU using)
numpy: 1.18.5
sklearn: 0.24.1
tqdm: 4.56.0

Quickstart Guide

for full-length or contigs
python   argnet.py --input  input_path_data  --type   aa/nt  --model   argnet-l   --outname   output_file_name

for short reads
python  argnet.py --input  input_path_data  --type   aa/nt  --model   argnet-s   --outname   output_file_name

general options:
--input/-i                 the test file as input
--type/-t                  molecular type of your test data (aa for amino acid, nt for nucleotide)
--model/-m             the model you assign to make the prediction (argnet-l for long sequences, argnet-s for short reads)
--outname/-on        the output file name

optional arguments:
-h, --help show this help message and exit

alt text
-i INPUT, --input INPUT
the test data as input

-t {aa,nt}, --type {aa,nt}
molecular type of your input file

-m {argnet-s,argnet-l}, --model {argnet-s,argnet-l}
the model to make the prediction

-on OUTNAME, --outname OUTNAME
the name of results output

Example

if we predict the long amino acid contigs by using ARGNet-L model, we could use command line (if you are in ARGNet dirctory):

python3  ./scripts/argnet.py  -i  ./tests/aa/long/arg100p.fasta  -t  aa  -m  argnet-l  -on  argnet_lsaa_test.txt

output will be like:
alt text
the first column test_id is the sequence label of the test sequnece.
the second column ARG_prediction is the "ARG" or "non-ARG" prediction of the input sequence.
the third column resistance_category is the classifition of the 36 antibiotics categories of the input sequence resisting to.
the last column probability is the classifition probability of the antibiotic predition of the input sequence by the model.

Contribute

If you'd like to contribute to ARGNet, check out https://github.com/patience111/argnet
Hope you enjoy ARGNet journey, any problem please contact scpeiyao@gmail.com