Plasmidome investigations to address the main transfer route of anti-microbial resistance (AMR) genes in bacteria
The project aims to analyze large-scale genomic datasets (441,120 RefSeq plasmids) to annotate and discover the presence and absence of AMR genes. Regarding the large-scale datasets that we are dealing with, we firstly started trial out the pipeline workflow on a small dataset contains 262 fasta sequences. More details about the trial data can be found here.
In this directory that contains the original data of our project, we fetched 66,147 fasta records and this script used to continue fetching the remaining sequences without fetching errors with a total fasta records of 441,120 files.
The pipeline workflow underlying according to the following structure:
plasmidfinder.py is utilized to identify plasmid incompatibility types.
For plasmidome annotation we used Prokka tool. more information can be found through Torsten Seemann paper
For Pan-Plasmidome investigation, we utilized Panaroo pipeline. More information can be found here
ABRicate package used for mass screening of the virulence genes
Resistance Gene Identifier (RGI) was utilized to predict the antibiotic resistomes from our RefSeq plasmids that retrieved from NCBI. More information can be found on CARD database