How to use GO to compare 2 sets of data

Question

How to use GO to compare 2 sets of data

Closed this issue 6 months ago · 10 comments

Hello,
I would like to use GO to compare 2 sets of proteomic data -- one from a wild type mouse and one from a mouse with a specific disease. I can't figure out how to do this. I went to enrichment (which I have done for metabolomics using a different p program) but it gave me enrichment scores with only one set of data and no fold differences between the two sets, which does not make sense to me. How do I compare the 2 sets of data?

Thank you,

Ellen

Answer 1 · 2024-05-23T14:54:37.000Z

Hi @ellenhascats,

Can you provide a little more detail? Include the tool you're using, the URL you accessed the tool at, a sample line of your data, and a sample of the output. Thanks!

Answer 2 · 2024-05-23T15:04:14.000Z

Also, check out https://geneontology.org/docs/go-enrichment-analysis/. GO enrichment analysis tells you which GO terms are over/under-represented in your list compared to the background, so it's not necessarily appropriate for all use cases.

Answer 3 · 2024-05-23T15:07:04.000Z

Hi @ellenhascats it sounds like you are looking for a tool that allows multiple set enrichment. This might be a helpful summary for you https://wiki.flybase.org/wiki/FlyBase:GSEA.

Answer 4 · 2024-05-23T15:32:45.000Z

Hi Suzie, I would like to compare 2 sets of data that were provided by mass spec analysis. Retinas from wild type mice and disease model mice were prepared as mitochondria-enriched fractions and analyzed by mass spec. I received a data analysis with protein IDs, p-values and fold-differences between the wild type and disease model. I would like to map the differences between the wild-type and the disease model model to see if specific mitochondrial pathways are perturbed in the diseased mouse. I have not used any tools for this yet because I am not sure what is appropriate. Out of curiosity, I put wild type data in GO Enrichment Analysis at https://geneontology.org<https://geneontology.org/> just to see what would happen and was surprised to get an enrichment analysis back with just one data set. However it would not be accurate because this was a mitochondrial enriched fraction. Therefore the ratios within the data set will not be correct. I hope that helps! Thank you! Ellen Ellen R. Weiss, PhD Department of Cell Biology & Physiology The University of North Carolina CB# 7090, 5340B MBRB 111 Mason Farm Rd. Chapel Hill, NC 27599-7090 919-966-7683 (office & voice mail) 919-843-9648 (lab) ***@***.******@***.***> On May 23, 2024, at 10:54 AM, suzialeksander ***@***.******@***.***>> wrote: You don't often get email from ***@***.******@***.***>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hi @ellenhascats<https://github.com/ellenhascats>, Can you provide a little more detail? Include the tool you're using, the URL you accessed the tool at, a sample line of your data, and a sample of the output. Thanks! — Reply to this email directly, view it on GitHub<#488 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BIVNJ4FHXIYYV7WFPD6O2FDZDX7MHAVCNFSM6AAAAABIEQANKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRXGM2DONRRGE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 5 · 2024-05-23T15:33:52.000Z

Thank you! I will look at this. Ellen On May 23, 2024, at 11:07 AM, Helen Attrill ***@***.******@***.***>> wrote: You don't often get email from ***@***.******@***.***>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hi @ellenhascats<https://github.com/ellenhascats> it sounds like you are looking for a tool that allows multiple set enrichment. This might be a helpful summary for you https://wiki.flybase.org/wiki/FlyBase:GSEA. — Reply to this email directly, view it on GitHub<#488 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BIVNJ4AXGGWT2JB5GHOKR4TZDYA25AVCNFSM6AAAAABIEQANKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRXGM3TQOJSGA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 6 · 2024-05-23T17:29:05.000Z

Hi @ellenhascats it sounds like inputting the correct 'background', as Suzie described, is what you need to do here. As you said, the wt sample will already by 'enriched' for mitochondrial processes as it is a mitochondrial prep and unless a background is selected then this will be against the rest of the genome. To make a background set you will need to combined everything that has been 'seen' in mass spec - ie what is detectable by the method.

Selecting the set to be analysed is up to you - a certain threshold for fold-difference with a significant p-value. Perhaps start with a 2-fold difference. A volcano plot may also help you to understand the distribution of your data - perhaps the group who did the mass spec have some specific software that they can recommend?

Answer 7 · 2024-05-23T18:10:53.000Z

Thank you! I am trying to get the mass spec people to respond. They say they do not use GO because they do not trust it (??). And they say so far they have not used any “annotation software” for their projects. I wrote back for absolute clarification and asked if they do this all manually. I am in contact with another lab here at UNC that used GO and they did do volcano plots. The issue is simply getting to the top of their “to do” list. :-) Ellen ellenhascats Ellen R. Weiss, PhD Department of Cell Biology & Physiology The University of North Carolina CB# 7545, 5340B MBRB 111 Mason Farm Rd. Chapel Hill, NC 27599-7090 919-966-7683 (office & voice mail) 919-843-9648 (lab) ***@***.*** On May 23, 2024, at 1:29 PM, Helen Attrill ***@***.***> wrote: You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hi @ellenhascats<https://github.com/ellenhascats> it sounds like inputting the correct 'background', as Suzie described, is what you need to do here. As you said, the wt sample will already by 'enriched' for mitochondrial processes as it is a mitochondrial prep and unless a background is selected then this will be against the rest of the genome. To make a background set you will need to combined everything that has been 'seen' in mass spec - ie what is detectable by the method. Selecting the set to be analysed is up to you - a certain threshold for fold-difference with a significant p-value. Perhaps start with a 2-fold difference. A volcano plot may also help you to understand the distribution of your data - perhaps the group who did the mass spec have some specific software that they can recommend? — Reply to this email directly, view it on GitHub<#488 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BIVNJ4B6SIH6O6IAO52OB4LZDYRPPAVCNFSM6AAAAABIEQANKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRXGY4TOMBZGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 8 · 2024-05-23T21:31:28.000Z

Would we use this? https://cbl-gorilla.cs.technion.ac.il/ However there is still no place to put in numbers… I am puzzled. Both sets may have many of the same proteins but the amount of each protein may be different. Ellen 919-966-7683 (office & voice mail) 919-843-9648 (lab) ***@***.*** On May 23, 2024, at 1:29 PM, Helen Attrill ***@***.***> wrote: You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hi @ellenhascats<https://github.com/ellenhascats> it sounds like inputting the correct 'background', as Suzie described, is what you need to do here. As you said, the wt sample will already by 'enriched' for mitochondrial processes as it is a mitochondrial prep and unless a background is selected then this will be against the rest of the genome. To make a background set you will need to combined everything that has been 'seen' in mass spec - ie what is detectable by the method. Selecting the set to be analysed is up to you - a certain threshold for fold-difference with a significant p-value. Perhaps start with a 2-fold difference. A volcano plot may also help you to understand the distribution of your data - perhaps the group who did the mass spec have some specific software that they can recommend? — Reply to this email directly, view it on GitHub<#488 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BIVNJ4B6SIH6O6IAO52OB4LZDYRPPAVCNFSM6AAAAABIEQANKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRXGY4TOMBZGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 9 · 2024-05-24T12:36:36.000Z

Hi @ellenhascats we can't really give you advice on how to use a tool by another group. You should may be contact the help at GOrilla for specific questions on how to use their tool and look at the documentation that they provide. I think that you really need someone who can directly go through the data you've been handed. There are so many potential variables that could influence how you treat/analyse the data.

In response to:
"I am puzzled. Both sets may have many of the same proteins but the amount of each protein may be different.":

For GSEA, the set of proteins you want to test for enrichment will be a subset of the disease model proteins identified. Perhaps start by selecting those with a fold change of at least 2 - ideally you will have done replicates and so you will have a p-value of <0.05 (0.01, even better), so that you know the change is signifcant. For enrichment, you can either try to input your own background (all proteins that the mass spec has detected across all exps here) or without a specified background and see what you get out.

I would suggest trying a few different GSEA tools and try playing with the settings to get a feel of your data and what the output looks like. May be group the enrichment set into those that go up and those that go down?

For this type of experiment, if you have done replicates, I really do suggest you also do a volcano plot first. It will help you see what is happening across the set of proteins and might give you some interesting leads.

Answer 10 · 2024-05-24T13:12:08.000Z

Thank you Helen. I will get some more advice on analyzing the data. Ellen On May 24, 2024, at 8:37 AM, Helen Attrill ***@***.***> wrote: You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Hi @ellenhascats<https://github.com/ellenhascats> we can't really give you advice on how to use a tool by another group. You should may be contact the help at GOrilla for specific questions on how to use their tool and look at the documentation that they provide. I think that you really need someone who can directly go through the data you've been handed. There are so many potential variables that could influence how you treat/analyse the data. In response to: "I am puzzled. Both sets may have many of the same proteins but the amount of each protein may be different.": For GSEA, the set of proteins you want to test for enrichment will be a subset of the disease model proteins identified. Perhaps start by selecting those with a fold change of at least 2 - ideally you will have done replicates and so you will have a p-value of <0.05 (0.01, even better), so that you know the change is signifcant. For enrichment, you can either try to input your own background (all proteins that the mass spec has detected across all exps here) or without a specified background and see what you get out. I would suggest trying a few different GSEA tools and try playing with the settings to get a feel of your data and what the output looks like. May be group the enrichment set into those that go up and those that go down? For this type of experiment, if you have done replicates, I really do suggest you also do a volcano plot first. It will help you see what is happening across the set of proteins and might give you some interesting leads. — Reply to this email directly, view it on GitHub<#488 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BIVNJ4D7SA5AXI4MH7MALJ3ZD4X6VAVCNFSM6AAAAABIEQANKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRZGQZTCNBVHA>. You are receiving this because you were mentioned.Message ID: ***@***.***>