Finding Tn-Seq Essential genes (FiTnEss)
Poster: FiTnEss_poster.pdf
FiTnEss is an R package using Transposon insertion sequencing data to identify essential genes in the genome.
Original paper on bioRxiv: Defining the core essential genome of Pseudomonas aeruginosa
Publication: Defining the core essential genome of Pseudomonas aeruginosa
After installing FiTnEss package, run main FiTnEss function by FiTnEss_Run
Arguments in this function include:
- strain
- file_location: path and name of tally file for run:
e.g.
"/home/your_folder/your_tally.txt"
- permissive_file: path and name of non-permissive TA site file that generated from genomic pre-processing step:
e.g.
"/home/your_folder/non_permissive_TA_sites.txt"
- homologous_file: path and name of homologous TA site file that generated from pre-processing step:
e.g.
"/home/your_folder/homologous_TA_sites.txt"
- gene_file: path and name of GFF3 gene annotation file. For example, GFF3 file could be downloaded from Pseudomonas Genome Database:
e.g.
"/your/folder/location/your_gff3_file.txt"
- save_location: path and name of where to save final results file:
e.g.
"/home/results_folder/results.xlsx"
- repeat_time: how many times to run the pipeline in order to obtain best results: by default, we run 3 times.
install.packages("devtools")
devtools::install_github("ruy204/FiTnEss")
Packages <- c("dplyr","fBasics","goftest","openxlsx","scales","stats","tidyr")
lapply(Packages, library, character.only = TRUE)
require(FiTnEss)
FiTnEss_Run("PA14",
"/your/folder/location/Test_set_P_aeruginosa/sample_data/PA14_M9_rep1_tally.txt",
"/your/folder/location/Test_set_P_aeruginosa/TAsite_info/nonpermissive_TA_sites.txt",
"/your/folder/location/Test_set_P_aeruginosa/TAsite_info/homologous_TA_sites.txt",
"/your/folder/location/Test_set_P_aeruginosa/genome_info/PA14_gff.txt",
"/your/folder/location/Test_set_P_aeruginosa/sample_data/test_results.xlsx",
repeat_time = 3)
Locus.CIA | gtot | Nta | pvalue | padj | Ess_fwer | pfdr | Ess_fdr |
---|---|---|---|---|---|---|---|
PA14_00410 | 5 | 1 | 0.015989 | 1 | NE_fwer | 0.093033 | NE_fdr |
Each tab in the .xlsx file saves results from each replicate. Within each results table, there are 8 columns:
- Locus.CIA: gene index
- gtot: total reads for the gene
- Nta: number of TA sites in this gene
- pvalue: unadjusted p-value of being essential
- padj: FWER-adjusted p-value
- Ess_fwer: confident essential category
- pfdr: FDR-adjusted p-value
- Ess_fdr: candidate essential category