How to use GO to compare 2 sets of data
Closed this issue · 10 comments
Hello,
I would like to use GO to compare 2 sets of proteomic data -- one from a wild type mouse and one from a mouse with a specific disease. I can't figure out how to do this. I went to enrichment (which I have done for metabolomics using a different p program) but it gave me enrichment scores with only one set of data and no fold differences between the two sets, which does not make sense to me. How do I compare the 2 sets of data?
Thank you,
Ellen
Hi @ellenhascats,
Can you provide a little more detail? Include the tool you're using, the URL you accessed the tool at, a sample line of your data, and a sample of the output. Thanks!
Also, check out https://geneontology.org/docs/go-enrichment-analysis/. GO enrichment analysis tells you which GO terms are over/under-represented in your list compared to the background, so it's not necessarily appropriate for all use cases.
Hi @ellenhascats it sounds like you are looking for a tool that allows multiple set enrichment. This might be a helpful summary for you https://wiki.flybase.org/wiki/FlyBase:GSEA.
Hi @ellenhascats it sounds like inputting the correct 'background', as Suzie described, is what you need to do here. As you said, the wt sample will already by 'enriched' for mitochondrial processes as it is a mitochondrial prep and unless a background is selected then this will be against the rest of the genome. To make a background set you will need to combined everything that has been 'seen' in mass spec - ie what is detectable by the method.
Selecting the set to be analysed is up to you - a certain threshold for fold-difference with a significant p-value. Perhaps start with a 2-fold difference. A volcano plot may also help you to understand the distribution of your data - perhaps the group who did the mass spec have some specific software that they can recommend?
Hi @ellenhascats we can't really give you advice on how to use a tool by another group. You should may be contact the help at GOrilla for specific questions on how to use their tool and look at the documentation that they provide. I think that you really need someone who can directly go through the data you've been handed. There are so many potential variables that could influence how you treat/analyse the data.
In response to:
"I am puzzled. Both sets may have many of the same proteins but the amount of each protein may be different.":
For GSEA, the set of proteins you want to test for enrichment will be a subset of the disease model proteins identified. Perhaps start by selecting those with a fold change of at least 2 - ideally you will have done replicates and so you will have a p-value of <0.05 (0.01, even better), so that you know the change is signifcant. For enrichment, you can either try to input your own background (all proteins that the mass spec has detected across all exps here) or without a specified background and see what you get out.
I would suggest trying a few different GSEA tools and try playing with the settings to get a feel of your data and what the output looks like. May be group the enrichment set into those that go up and those that go down?
For this type of experiment, if you have done replicates, I really do suggest you also do a volcano plot first. It will help you see what is happening across the set of proteins and might give you some interesting leads.