Can I keep protein groups separate?
ht-lau opened this issue · 1 comments
Hi,
I think it is more appropriate to open a new thread on this.
I wonder if there is a way to keep protein group separate between ambiguous peptides. For example,
peptide A, GRIA1; GRIA2
peptide B, GRIA1
remove_ambiguous_proteingroups = FALSE
will output GRIA1;GRIA2
remove_ambiguous_proteingroups = FALSE
will output GRIA1
I would like to know if there is a way that I can keep both GRIA1 and GRIA1;GRIA2 in the output, even if I will have to manipulate the input. Because the ability to keep them apart can be very valuable based on these recent manuscripts
https://doi.org/10.1038/s41467-023-41558-2
https://doi.org/10.1101/2023.09.19.558203
Thanks
HT
I'm not sure what you are exactly asking, so I will try to disentangle by clarifying how MS-DAP deals with peptide-to-protein mappings from A-Z.
-
raw data processing software identifies peptides and performs protein inference; this software applies some algorithm (e.g. based on IDpicker or ProteinProphet) to assign observed peptides to proteingroups. At this point, decisions are made on how to deal with unique and shared peptides (e.g. "razor" peptides might be assigned to one proteingroup in a winner-takes-all approach). How this is done exactly depends on the respective software (DIA-NN / Spectronaut / MaxQuant / FragPipe / etc.)
-
when you load your dataset into MS-DAP, the peptide-to-proteingroup assignments are used as-is. So the "protein_id" assigned to each peptide/precursor in the MS-DAP peptide data table is the exact same as provided by upstream software. So if upstream software states that peptide X is assigned to "GRIA1" and peptide Y to "GRIA1;GRIA2" we assume that is correct / makes sense.
Note that MS-DAP cannot know at this point which peptides are assigned to proteingroup A because they are unique for the respective protein, and which are "razor peptides" that are assigned to A by lack of other evidence. Hence, we use all input data as-is and offer no further control over peptide-to-protein assignments at the moment.
- differential expression analysis (DEA) in MS-DAP yields proteingroup-level statistics (log2fc and p-value) for each statistical contrast. For each proteingroup in the output (e.g. differential_abundance_analysis.xlsx ) the respective set of unique gene symbols is shown (
gene_symbols_or_id
column) and importantly, this is still consistent with the input data (i.e. for each row/proteingroup, the results are based on the subset of peptides originally assigned to this proteingroup that also pass all filtering rules that you defined, such asfilter_min_detect
).
Note that at this stage, we are looking at proteingroup-level statistics (e.g. output from DEqMS or MSqRob). Peptide information is lost at this point (i.e. the proteingroup info is a summary of respective peptides).
- if you want to summarize/simplify the DEA results, you may use the new
summarise_stats
function introduced with MS-DAP 1.0.6, documentation page is available here. All this function does is filter and summarise the the DEA results (which only contains proteingroup-level information). For example, if you setremove_ambiguous_proteingroups=TRUE
it will filter the DEA table and simply remove all rows where thegene_symbols_or_id
column contains multiple gene symbols.
Specific to your question;
I wonder if there is a way to keep protein group separate between ambiguous peptides. For example,
peptide A, GRIA1; GRIA2
peptide B, GRIA1
If the upstream software assigned peptides A and B in your example to different proteingroups, they will also be in separate proteingroups throughout MS-DAP. As mentioned above; we use peptide-to-proteingroup assignements from the dataset you import from DIA-NN/FragPipe/etcetera as-is.
I would like to know if there is a way that I can keep both GRIA1 and GRIA1;GRIA2 in the output
They are by default