CompRanking is a pipeline for comprehensively assessing the AMR key message of metagenomic samples and rank their risk of resistome. CompRanking calculate each AMR key message, antibiotic resistance genes abundance, mobility and potentials to acquired by pathogens at the contigs level. Generally, CompRanking can give results derived from three features as the co-occurrance of ARGs, MGEs on one contigs and their potentials in pathogens.
Step 1: Change the current working directory to the location where you want the cloned CompRanking directory to be made. Step 2: Clone the repository using git command
git clone https://github.com/GaoyangLuo/CompRanking
Please firstly set up all the environment by the following commands. These commands will help to config all the environment needed.
$ cd CompRanking
$ conda env create -f CompRanking.yaml
$ bash setup.sh
CompRanking relies on multi conda environments. Before run the demo test, conda bin path should be pre-requisit. Please set your absolute bin path of miniconda. For example, your absolute bin path is /home/username/miniconda3/bin
.
How to set your absolute conda bin path:
First step, vim your test_yaml.yaml
file
$ vi test_yaml.yaml
Second step, Re-write the real path of miniconda/bin
CompRanking:
abs_path_to_conda_bin: /your_real_path/miniconda/bin #don't use "~" or "./", please use absolute path
Pleast note that don't use relative path, do not use "~" or "./"
You can download the databases from the location: https://doi.org/10.5281/zenodo.8073486
. Or run the command lines below.
$ wget https://zenodo.org/record/8073486/files/CompRanking_database_v1.tar.gz?download=1
$ wget https://zenodo.org/record/8073486/files/localDB.zip?download=1
$ tar -zxvf CompRanking_database_v1.tar.gz && mv CompRanking_database_v1 databases
$ unzip localDB.zip
We provided a set of data for test.
$ python cpr_multiprocess.py -i test_data -t 12 -r 1 -p test_demo
Step 1: Gene prediction can generate contextural information of AMR and pathogen information of the whole metagenome. Run the command line below:
$ python cpr_multiprocess.py -i <input_dir> -t <threads> -r <if_restart> -p <project_name_prefix>
Parameters:
- -i, <input_dir> contains all the fastq files and fasta files. Files of the the sample should be named using identical
<prefix>
. For example,FileNameOne_1.fq
,FileNameOne_2.fq
andFileNameOne.fa
represents the pair-end reads fastq files (after quality control) and the assembly file (containing contigs and pleast do not cut into your customed length, default_min_length=500, which cannot be altered). - -t, , the threads you want to use to run the process (Default=16).
- -r, <if_restart> 0 or 1. 0 means continue to run after the last break up point. 1 means re-start from the begeining.
- -p, <project_name_prefix> You should
Step 2: After finishing all the prediction steps, we should calculate the relative abundance of functional genes, run the command line below:
$ python ./compranking/multiGeneCal_metagenome_rpkg_scg_geneName.py
-i <input_dir>
-p <project_prefix>
-n AGS
-t 16
-d <pth2KK2db> #this option is for cell copy normalized by sequence abundance, need to run multiGeneCal_16s.py
The output demo is like below:
ARG_name | Class | Database | MGE_type | Sample_1 | Sample_2 | Sample_3 |
---|---|---|---|---|---|---|
AAC(2')-I | aminoglycoside | DeepARG | Unknown | 0.003 | 0.004 | 0.006 |
ERMB | macrolide | RGI | phage/plasmid | 0.002 | 0.003 | 0.004 |
SUL3 | sulfonamide | SARG | plasmid | 0.001 | 0.003 | 0.005 |
note: phage/plasmid
means ARGs both fpundto be co-located with phage- or plasmid-like contig in one sample (microbial community).
plasmid
means only found to be co-located with plasmid-like contig. Unknown
means not to be found co-located with any MGEs, but not representing it is not co-located with any MGEs, probably due to the accuracy and recall of identification method.
Step 3: Generate a risk score and corresponding valuse of each sample. In this step, you can acquire various parameters such as how many ARGs-carried contigs or phage- or plasmids-related contigs in your samples. Please run the command line below:
$ python ./compranking/baseInfoExtra_nContigs.py -i <input_dir> -p <project_name_prefix>
sample_name/index | nContigs | nARGs_contigs | nMGEs_contig | nMGEs_plasmid_contig | nMGEs_phage_contigs | nPAT_contigs | nARGs_MGEs_contig | nARGs_MGEs_plasmid_contigs | nARGs_MGEs_phage_contigs | nARGs_MGEs_PAT_contigs | fARG | fMGE | fMGE_plasmid | fMGE_phage | fPAT | fARG_MGE | fARG_MGE_plasmid | fARG_MGE_phage | fARG_MGE_PAT | score_pathogenic | score_phage | score_plasmid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample2 | 433650 | 482 | 327780 | 270746 | 32238 | 397572 | 357 | 304 | 28 | 61 | 0.0011 | 0.7558 | 0.6247 | 0.0746 | 0.9168 | 0.0008 | 0.0007 | 6.4568 | 0.0001 | 23.1492 | 20.7351 | 22.7372 |
Sample1 | 433650 | 482 | 327780 | 270746 | 32238 | 397572 | 357 | 304 | 28 | 61 | 0.0011 | 0.7558 | 0.6247 | 0.0746 | 0.9168 | 0.0008 | 0.0007 | 6.4568 | 0.0001 | 23.1492 | 20.7351 | 22.7372 |
Use the jupyter notebook MGE_carried_ARGs_type_count.ipynb
to calculate. The metadata record the number of five types of elements that co-exist with ARGs: plasmid, phage, unclassified (can be any type of sequences, including chromosome or other unknown or unidentified MGEs), IS (Insertion Sequence), IE (Integrated Elements). Table will be generated like this:
Sample1_x | Sample2_y | Sample3_z | Sample3_m | Sample3_n | |
---|---|---|---|---|---|
aminoglycoside | 33 | 35 | 36 | 37 | 38 |
macrolide | 44 | 45 | 46 | 37 | 38 |
tetracycline | 22 | 23 | 24 | 37 | 38 |
sampleName_x: #plasmid
sampleName_y: #phage
sampleName_z: #unclassified
sampleName_m: #IS
sampleName_n: #IE
Every process will generte a checkpointing file in the repo checkdone, with file name like <Your_Project_Name>.index_build.done
. If you want to re-run the pipeline from the last broken step, you can set the parameter -r
as 0
, which means don't re-run from the beginning. If you set 1
, means you want to re-run from the beginning. You can also delete the .done
file if you want to re-run the speicific step. We make this pipeline able to identify which step you have run and which one is not completed.