tanghaibao/goatools

Comparison of GO Terms between list

BioinformaNicks opened this issue · 7 comments

Dear authors,

I would like to ask for an example of using compare_gos.py to compare occurences of GO ID's between two different sets/lists.

I have a table like this for example:

Protein \t GO ID \t Function
X \t GO:0005737 \t cytoplasm

And a similar one for another condition. How do I compare whether certain GO ID's are enriched in comparison to the other condition?

Thank you for taking the time to contact us and for the great question. Other user's will surely benefit from our answering your question. I will add a link in the README.md to this issue so they can see how to run it.

Here is an example that you can run if you clone the goatools repo:

$ scripts/compare_gos.py data/compare_gos/tat_gos_simple1.tsv data/compare_gos/tat_gos_simple2.tsv

XX GO:0008150 BP 29210 D00  biological_process
XX GO:0065007 BP 12884 D01  biological regulation
XX GO:0050789 BP 11559 D02  regulation of biological process
XX GO:0050794 BP  8746 D03  regulation of cellular process
X. GO:0048519 BP  3641 D03  negative regulation of biological process
.X GO:0048518 BP  3519 D03  positive regulation of biological process
X. GO:0048523 BP  2662 D04  negative regulation of cellular process

# Marker keys:
#     X -> GO is present in tat_gos_simple1
#     X -> GO is present in tat_gos_simple2

Your file format should work just fine. Here is a sample. GO IDs in i162a.tsv and i162b.tsv will be compared:

Contents of i162a.tsv

Protein \t GO ID \t Function
X \t GO:0005737 \t cytoplasm
X \t GO:0048523 \t cytoplasm

Contents of i162b.tsv

Protein \t GO ID \t Function
X \t GO:0005737 \t cytoplasm
X \t GO:0048518 \t cytoplasm

Run:

$ scripts/compare_gos.py i162a.tsv i162b.tsv
.X GO:0048518 BP  3519 D03  positive regulation of biological process
X. GO:0048523 BP  2662 D04  negative regulation of cellular process
XX GO:0005737 CC  1200 D02  cytoplasm

# Marker keys:
#     X -> GO is present in i162a
#     X -> GO is present in i162b

The compare_gos script picks up all GO IDs on a line using regex.

Lines beginning with # are considered comments and are ignored, even if they contain GO IDs.

Thanks for creating this script and the library, it's really helpful. Is there a way to save the output of the comparison into a format that can be imported from another such as tsv, csv..etc?

EDIT: I found the solution by going through the script. I can use the --xlsx option to output as an excel file.

Thank you so much for your interest in GOA Tools and for taking your time to write us. I would like to add a tsv format too. I am putting this on our TODO list.

Please let us know any more information that might be relevant to this issue or your thoughts about any specific features or implementations.

Issue appears to be resolved with --xlsx option as reported.