Title: GTEvaluator README Author: Arnaud Felten, Laurent Guillier Affiliation: Food Safety Laboratory – ANSES Maisons Alfort (France)
You can find the latest version of the tool at https://github.com/afelten-Anses/GTEvaluator-1.0
HTML and pdf user technical documentation are available in the 'docs/' directory.
A sample dataset to test the workflow is available in the 'dataset/' directory.
GTEvaluator
GTEvaluator (GenoTargetFinder) is a workflow able to select the most specific and sensitive markers from a list of suitable markers and whole genomic sequencing data.
GTEvaluator is composed of 3 scripts written in python :
-
GTEvaluator_matrixMaker matchs targets against sequences and makes a matrix table with 1 if the target is present, 0 otherwise;
-
GTEvaluator_statistic computes specificity and sensitivity for each target and variant from matrix;
-
GTEvaluator is a driver script, it runs consecutively GTEvaluator_matrixMaker and GTEvaluator_statistic.
Each script can be run separately.
Quick Start
run it on Linux/Mac OS X system
Simply run the command in the 'src/' directory :
./GTEvaluator
or :
./GTEvaluator_matrixMaker
or :
./GTEvaluator_statistic
We recommend to set scripts in your $PATH variable :
export PATH=$PATH:src/
Dependencies
GTEvaluator needs python 2.7 (tested with 2.7.6), and the following librarires are require :
GTEvaluator requires 'fuzznuc' from EMBOSS to match targets against sequences. You can download EMBOSS programs here :
http://emboss.sourceforge.net/download/
GTEvaluator was tested with EMBOSS 6.6.0.0.
GTEvaluator Parameters
GTEvaluator_matrixMaker parameters
- '-i': genomes directory list with servovar name separated by tabular. See 'test_dataset/Bacillus_genomes.txt' for example.
- '-p': target TSV file with the target name in column 1 and the target sequence in column 2. For primers, the forward sequence must be in the column 2 and the reverse seqeunce in the column 3 (none for target). See 'test_dataset/Bacillus_targets.tsv' for example.
- '-m': primers forward and reverse max distance ['1400']
- '-T': maximum number of threads to use ['1']
- '-o': output name ['output']
- '--trim': number of nucleotides to trim at 5' ['0']
GTEvaluator_statistic parameter
- '-i': matrix file generated by 'GTEvaluator_matrixMaker'
- '-o': output prefix name ['output']
- '--CiMin': minimum confidence value ['0.025']
- '--CiMax': maximum confidence value ['0.975']
'GTEvaluator' script combines 'GTEvaluator_matrixMaker' and 'GTEvaluator_statistic' parameters.
GTEvaluator programs
'GTEvaluator_matrixMaker' uses 'fuzznuc' to match targets against sequences, you must be found it in your $PATH.
'GTEvaluator' runs 'GTEvaluator_matrixMaker' and 'GTEvaluator_matrixMaker', so these both scripts must be found it in your $PATH.
Ouputs
GTEvaluator_matrixMaker output
GTEvaluator_matrixMaker generates a tabular file, its 1st column contain the genome file name and the 2nd contain the subgroup name (ex: subtype, serovar...) of the genome. The other columns represent the presence ('1') or the abscence ('0') of each target.
GTEvaluator_statistic output
GTEvaluator_statistic generates graphicals outputs. For each subgroup, a plot represents the sensitivity and the specificity of all target. The correspondence between target name and point number is writing in the GTEvaluator_statistic tab output. If the number of subgroups are lower than 6, a global plot is generated (all targets for all groups).
The other type of output is a tabular file. For each subgroup and each target, the sensitivity, the specificity, the statistical distance and the confidence interval are computed. Targets are ordered by distance.
GTEvaluator test
The workflow can be tested with the dataset included in the 'dataset' directory.
After setting scripts in $PATH, go to 'test_dataset/' directory :
cd test_dataset/
GTEvaluator_matrixMaker test :
../src/GTEvaluator_matrixMaker -i Bacillus_genomes.txt -p Bacillus_targets.tsv -o matrix_test.tsv
GTEvaluator_statistic test :
../src/GTEvaluator_statistic -i matrix_test.tsv -o test
Or directly the GTEvaluator test :
../src/GTEvaluator -i Bacillus_genomes.txt -p Bacillus_targets.tsv -o test