Plasmidome investigations to address the main transfer route of anti-microbial resistance (AMR) genes in bacteria

This project is ongoing. Changes will be updated accordingly.

The project aims to analyze large-scale genomic datasets (441,120 RefSeq plasmids) to annotate and discover the presence and absence of AMR genes. Regarding the large-scale datasets that we are dealing with, we firstly started trial out the pipeline workflow on a small dataset contains 262 fasta sequences. More details about the trial data can be found here.

In this directory that contains the original data of our project, we fetched 66,147 fasta records and this script used to continue fetching the remaining sequences without fetching errors with a total fasta records of 441,120 files.

The pipeline workflow underlying according to the following structure:

Identify Inc types

plasmidfinder.py is utilized to identify plasmid incompatibility types.

Whole genome annotation

For plasmidome annotation we used Prokka tool. more information can be found through Torsten Seemann paper

Pan-Plasmidome investigation

For Pan-Plasmidome investigation, we utilized Panaroo pipeline. More information can be found here

Mass Screening of contigs for virulence genes

ABRicate package used for mass screening of the virulence genes

AMR gene discovery

Resistance Gene Identifier (RGI) was utilized to predict the antibiotic resistomes from our RefSeq plasmids that retrieved from NCBI. More information can be found on CARD database

IbrahimElzahaby/Erasmus_MC_Internship