Comparison of GO Terms between list
BioinformaNicks opened this issue · 7 comments
Dear authors,
I would like to ask for an example of using compare_gos.py to compare occurences of GO ID's between two different sets/lists.
I have a table like this for example:
Protein \t GO ID \t Function
X \t GO:0005737 \t cytoplasm
And a similar one for another condition. How do I compare whether certain GO ID's are enriched in comparison to the other condition?
Thank you for taking the time to contact us and for the great question. Other user's will surely benefit from our answering your question. I will add a link in the README.md to this issue so they can see how to run it.
Here is an example that you can run if you clone the goatools repo:
$ scripts/compare_gos.py data/compare_gos/tat_gos_simple1.tsv data/compare_gos/tat_gos_simple2.tsv
XX GO:0008150 BP 29210 D00 biological_process
XX GO:0065007 BP 12884 D01 biological regulation
XX GO:0050789 BP 11559 D02 regulation of biological process
XX GO:0050794 BP 8746 D03 regulation of cellular process
X. GO:0048519 BP 3641 D03 negative regulation of biological process
.X GO:0048518 BP 3519 D03 positive regulation of biological process
X. GO:0048523 BP 2662 D04 negative regulation of cellular process
# Marker keys:
# X -> GO is present in tat_gos_simple1
# X -> GO is present in tat_gos_simple2
Your file format should work just fine. Here is a sample. GO IDs in i162a.tsv and i162b.tsv will be compared:
Contents of i162a.tsv
Protein \t GO ID \t Function
X \t GO:0005737 \t cytoplasm
X \t GO:0048523 \t cytoplasm
Contents of i162b.tsv
Protein \t GO ID \t Function
X \t GO:0005737 \t cytoplasm
X \t GO:0048518 \t cytoplasm
Run:
$ scripts/compare_gos.py i162a.tsv i162b.tsv
.X GO:0048518 BP 3519 D03 positive regulation of biological process
X. GO:0048523 BP 2662 D04 negative regulation of cellular process
XX GO:0005737 CC 1200 D02 cytoplasm
# Marker keys:
# X -> GO is present in i162a
# X -> GO is present in i162b
The compare_gos
script picks up all GO IDs on a line using regex.
Lines beginning with #
are considered comments and are ignored, even if they contain GO IDs.
Thanks for creating this script and the library, it's really helpful. Is there a way to save the output of the comparison into a format that can be imported from another such as tsv, csv..etc?
EDIT: I found the solution by going through the script. I can use the --xlsx
option to output as an excel file.
Thank you so much for your interest in GOA Tools and for taking your time to write us. I would like to add a tsv format too. I am putting this on our TODO list.
Please let us know any more information that might be relevant to this issue or your thoughts about any specific features or implementations.
Issue appears to be resolved with --xlsx
option as reported.