/GOFindBias

The current main function of this project is to help people with .gaf file analysis.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

GOFindBias: Analysis tool for finding bias in the GAF files.

GOFindBias is developed to provide the user with some insightful statistics about the GAF file to determine if the conclusions on the gene ontology studies can be biased because of abstract terms or the high throughput experiments(1)..

Statistics Provided by the tool are as follows:

  1. Shannon's equitability.
  2. Top 'n' PubMed and GO terms.
  3. KS test to compare two different GAF files.
  4. Mutual terms between two GAF files which are among 'n' most frequent terms and their respective frequencies.

Prerequisites:

Required modules.

Modules are available in most GNU/Linux distributions, or from their respective websites.

Installation

Installing from source

git clone https://github.com/Pranavkhade/GOFindBias
cd GOFindBias
python setup.py install

Installing with pip

pip install GOFindBias

OR

pip install git+git://github.com/Pranavkhade/GOFindBias

Files and instructions

  1. Collect the GAF file you wish to analyse. For reference GAF files you can visit ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
  2. You can also use the GAF files obtained as an output from the debias program.
  3. Instructions and help for the parameter is as follows:
usage: GOFindBias.py [-h]
                     (-i GAF_FILE [GAF_FILE ...] | -cmpr FILENAME FILENAME)
                     [-ls 1 OR O] [-ts TOP]
                     [-e EVIDENCE_CODE [EVIDENCE_CODE ...]]

GOFindBias is an analytical tool built to analyse the .gaf files. Please
visit: https://github.com/Pranavkhade/GOFindBias for more details.

optional arguments:
  -h, --help            show this help message and exit
  -i GAF_FILE [GAF_FILE ...], --input GAF_FILE [GAF_FILE ...]
                        Names of the input GAF file(s).
  -cmpr FILENAME FILENAME, --compare FILENAME FILENAME
                        Names of the two GAF files
  -ls 1 OR O, --logscale 1 OR O
                        For graphs, 0: Counts in normal scale 1: Counts in log
                        scale [default=0]
  -ts TOP, --topstat TOP
                        Top n statistics sorted from highest to lowest
                        [default=10]
  -e EVIDENCE_CODE [EVIDENCE_CODE ...], --evidence EVIDENCE_CODE [EVIDENCE_CODE ...]
                        Accepts Standard Evidence Codes outlined in
                        (http://geneontology.org/page/guide-go-evidence-
                        codes). All 3 letter code for each standard evidence
                        is acceptable. In addition to that EXPEC is accepted
                        which will pull out all annotations which are made
                        experimentally. COMPEC will extract all annotations
                        which have been done computationally. Similarly,
                        AUTHEC and CUREC are also accepted.

Examples

  1. GOFindBias -i test/2014.gaf -ls 1 -ts 10 This command will parse 2014.gaf file for the analysis and all the GO Term counts will be represented on the Natural Log scale for better comparitive visualisation of the data. The last argument is the number of top 'n' entries with highest count in the GAF file. The output of graphs will be posted in the /graph_output folder with names corrosponding to GO/PubMed ID count and the ontology level (F/C/P). File named Shannon's_Statistics.txt will have the information about the diversity of a given .gaf file.

  2. GOFindBias -cmpr test/2014.gaf test/2015.gaf -ts 50 This command will compare two files and will give KS(non-parametric) Test p-value. Along with it, it will create COMPARE.txt having common GO terms and PMID between top 50(n) most frequent terms from each GAF file.

  3. GOFindBias -i test/2014.gaf test/2015.gaf test/2016.gaf -e EXPEC This command will parse all the mentioned files and will fetch statistics for only Experimental Evidence Codes.

NOTE

  1. If you are using Anaconda environment, make sure that the Python reads libraries from "~anaconda2/lib/python2.7/site-packages/lib". You can also simply copy the files in that location to an appropriate path where other python libraries are readable (importable) by Python.
  2. You can find few test .gaf files in the /lib/test/ folders.