This repository contain a list of small tools that can run independently to perform small tasks.
snpExtractorSEG is a tool that calculates Shannon Entropy (SE). Scripts named in this convention uses SNP files generated by Geneious, hence the "G" in the name. The tool takes in a region file where user can specify how to split the calculation of SE into desired regions.
*****
snpExtractorSEG.py is a SNP extractor tool that filters desired SNPS
from Geneious (based on ver. R9.1) SNP files (usually in csv format)
and calculates Shannon Entropy (SE) by codons.
Positions without recorded SNPs will be treated as 0.
Note on Optional Arguments:
- By default this tool calculates SE with any nucleotide changes.
- Adding optional arguments -in and/or -is will produce extra files
that filter SE calculated for non-synonymous and/or synonymous
mutations separately.
Note - ignores insertion/deletions/frameshifts/truncation
in current version.
*****
usage: snpExtractorSEG5.py [-h] -f FILE -s S_COL -pt PT_COL -pe PE_COL -fq
FREQ_COL -cc CDN_CHG -aa AA_CHG
[-p PROTEIN_SEGMENT] [-is] [-in] [-d DIRECTORY]
[-seo]
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE File to extract snp information from.
-s S_COL, --s_col S_COL
The column position storing the positions of
mutations.
-pt PT_COL, --pt_col PT_COL
The column position storing polymorphism type.
-pe PE_COL, --pe_col PE_COL
The column position storing protein effect.
-fq FREQ_COL, --freq_col FREQ_COL
The column position storing the frequencies.
-cc CDN_CHG, --cdn_chg CDN_CHG
The column position storing codon change.
-aa AA_CHG, --aa_chg AA_CHG
The column position storing amino acid change.
-p PROTEIN_SEGMENT, --protein_segment PROTEIN_SEGMENT
Protein region file for calculating avg SE over all
positions
-is, --include_syn Include SE output for synonymous mutations only.
-in, --include_nonsyn
Include SE output for non-synonymous mutations only.
-d DIRECTORY, --directory DIRECTORY
User specified directory to save results, otherwise
saves at current location
-seo, --SE_output Retrieve the files containing SE calculations per
position
-
snpExtractorSEG4.1.py -> Can take in a region file where regions are continuous. For instance: Region1 spans 1-500.
-
snpExtractorSEG5.py -> Can take in a region file where regions are separated. For instance: Region1 includes 1,6,25,35
An example of what a region file looks like for snpExtractorSEG4.1.py. Essentially .csv format:
Region1,1,573
Region2,574,1149
Region3,1150,2238
An example of what a region file looks like for snpExtractorSEG5.py. :
Region1,1,6,25,35
Region2,510,1149
Region3,511,2238,2290
Written with StackEdit.