/SmallTools

This repository contain a list of small tools that can run independently to perform small tasks.

Primary LanguagePython

SmallTools

This repository contain a list of small tools that can run independently to perform small tasks.

snpExtractorSEG

snpExtractorSEG is a tool that calculates Shannon Entropy (SE). Scripts named in this convention uses SNP files generated by Geneious, hence the "G" in the name. The tool takes in a region file where user can specify how to split the calculation of SE into desired regions.

    *****   
        snpExtractorSEG.py is a SNP extractor tool that filters desired SNPS
        from Geneious (based on ver. R9.1) SNP files (usually in csv format) 
        and calculates Shannon Entropy (SE) by codons.

        Positions without recorded SNPs will be treated as 0.

        Note on Optional Arguments:

        -   By default this tool calculates SE with any nucleotide changes.
        -   Adding optional arguments -in and/or -is will produce extra files
            that filter SE calculated for non-synonymous and/or synonymous
            mutations separately.

        Note - ignores insertion/deletions/frameshifts/truncation
               in current version.
    *****
    
usage: snpExtractorSEG5.py [-h] -f FILE -s S_COL -pt PT_COL -pe PE_COL -fq
                           FREQ_COL -cc CDN_CHG -aa AA_CHG
                           [-p PROTEIN_SEGMENT] [-is] [-in] [-d DIRECTORY]
                           [-seo]

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  File to extract snp information from.
  -s S_COL, --s_col S_COL
                        The column position storing the positions of
                        mutations.
  -pt PT_COL, --pt_col PT_COL
                        The column position storing polymorphism type.
  -pe PE_COL, --pe_col PE_COL
                        The column position storing protein effect.
  -fq FREQ_COL, --freq_col FREQ_COL
                        The column position storing the frequencies.
  -cc CDN_CHG, --cdn_chg CDN_CHG
                        The column position storing codon change.
  -aa AA_CHG, --aa_chg AA_CHG
                        The column position storing amino acid change.
  -p PROTEIN_SEGMENT, --protein_segment PROTEIN_SEGMENT
                        Protein region file for calculating avg SE over all
                        positions
  -is, --include_syn    Include SE output for synonymous mutations only.
  -in, --include_nonsyn
                        Include SE output for non-synonymous mutations only.
  -d DIRECTORY, --directory DIRECTORY
                        User specified directory to save results, otherwise
                        saves at current location
  -seo, --SE_output     Retrieve the files containing SE calculations per
                        position

  • snpExtractorSEG4.1.py -> Can take in a region file where regions are continuous. For instance: Region1 spans 1-500.

  • snpExtractorSEG5.py -> Can take in a region file where regions are separated. For instance: Region1 includes 1,6,25,35

An example of what a region file looks like for snpExtractorSEG4.1.py. Essentially .csv format:

Region1,1,573
Region2,574,1149
Region3,1150,2238

An example of what a region file looks like for snpExtractorSEG5.py. :

Region1,1,6,25,35
Region2,510,1149
Region3,511,2238,2290

Written with StackEdit.