bigbio/quantms

SAGE error with modification not supported

Closed this issue · 8 comments

Description of the bug

 25CPTAC_LUAD_W_BI_20180901_KR_f04.mzML 25CPTAC_LUAD_W_BI_20180901_KR_f05.mzML 25CPTAC_LUAD_W_BI_20180901_KR_f14.mzML 25CPTAC_LUAD_W_BI_20180901_KR_f20.mzML 25CPTAC_LUAD_W_BI_20180901_KR_f22.mzML 25CPTAC_LUAD_W_BI_20180901_KR_f23.mzML \
      -out out_0_sage.idXML \
      -threads 6 \
      -database "Homo-sapiens-uniprot-reviewed-contaminants-decoy-202210.fasta" \
      -decoy_prefix DECOY_ \
      -min_len 6 \
      -max_len 40 \
      -min_matched_peaks 1 \
      -min_peaks 1 \
      -max_peaks 500 \
      -missed_cleavages 2 \
      -report_psms 1 \
      -enzyme "Trypsin" \
      -precursor_tol_left -20.0 \
      -precursor_tol_right 20.0 \
      -precursor_tol_unit ppm \
      -fragment_tol_left -20.0 \
      -fragment_tol_right 20.0 \
      -fragment_tol_unit ppm \
      -fixed_modifications 'Carbamidomethyl (C)' 'Carbamidomethyl (U)' 'TMT6plex (N-term)' 'TMT6plex (K)' \
      -variable_modifications 'Acetyl (Protein N-term)' 'Deamidated (N)' 'Gln->pyro-Glu (N-term Q)' 'Oxidation (M)' 'Pyro-carbamidomethyl (N-term C)' \
      -max_variable_mods 3 \
      -isotope_error_range 0,1 \
      -PeptideIndexing:IL_equivalent \
      -PeptideIndexing:unmatched_action warn \
      -debug 0 \
       \
      2>&1 | tee out_0_sage.log
  
  if [[ 625 -ge 2 ]]; then
      IDRipper -in out_0_sage.idXML -out . -split_ident_runs
      rm out_0_sage.idXML
      for f in *.idXML
      do
          mv "$f" "${f%.*}_sage.idXML"
      done
  fi
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:TMT:ID:DATABASESEARCHENGINES:SEARCHENGINESAGE":
      SageAdapter: $(SageAdapter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
      sage: $(sage 2>&1 | grep -E 'Version [0-9]+\.[0-9]+\.[0-9]+')
  END_VERSIONS

Command exit status:
  8

Command output:
  Found Sage version string: Version 0.13.4
  Error: Unexpected internal error (the value 'Pyro-carbamidomethyl (N-term C)' was used but is not valid; Modification not found: )

Command wrapper:
  Found Sage version string: Version 0.13.4
  Error: Unexpected internal error (the value 'Pyro-carbamidomethyl (N-term C)' was used but is not valid; Modification not found: )

Work dir:
  /hps/nobackup/juan/pride/reanalysis/differential-expression/tmt/PDC000153/work/39/d70d1337992834b21d39e1913942b5

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details```

### Command used and terminal output

_No response_

### Relevant files

_No response_

### System information

_No response_

Probably just a bug in the adapter. Should be supported.

I have no idea why the modified peptide generator needs to be used here: https://github.com/OpenMS/OpenMS/blob/ecfd8431856a16f04d21d001191294005ebe7745/src/topp/SageAdapter.cpp#L324C22-L324C22

But it's probably the problem.

I think

  1. we can't assign a static Carbamidomethyl (C) and a variable mod at C
  2. Pyro-carbamidomethyl (N-term C) - the delta is relative to Cys and not to the Carabamidomethylated one. We can't use both on the same residue.
  3. It doesn't use the ModifiedPeptideGenerator but just uses a static helper it its namespace (to get details of the fixed mod)

A temporary solution could be to use both as variable modifications. This should give the correct delta masses.

On second thinking - could it be that the actual modification they want is ammonia loss? It seems to lead to Pyro-carbamidomethyl (C)?

      <umod:mod title="Ammonia-loss" full_name="Loss of ammonia" username_of_poster="unimod"
         <umod:specificity hidden="0" site="C" position="Any N-term" classification="Artefact"
                           spec_group="3">
            <umod:misc_notes>Pyro-carbamidomethyl as a delta from Carbamidomethyl-Cys</umod:misc_notes>
         </umod:specificity>
         <umod:specificity hidden="1" site="S" position="Protein N-term"
                           classification="Post-translational"
                           spec_group="2"/>
         <umod:specificity hidden="1" site="T" position="Protein N-term"
                           classification="Post-translational"
                           spec_group="1"/>
         <umod:specificity hidden="1" site="N" position="Anywhere" classification="Chemical derivative"
                           spec_group="4">
            <umod:misc_notes>N-Succinimide</umod:misc_notes>
         </umod:specificity>

Interesting that the same config was pass to COMETAdapter and it works

Correct, that's why I think it's a bug in the "helper" function.

For example, I don't know why one would want to look up by full ID if the residue and terminus is known.
Not sure if that's supported.
https://github.com/OpenMS/OpenMS/blob/ecfd8431856a16f04d21d001191294005ebe7745/src/openms/source/CHEMISTRY/ModifiedPeptideGenerator.cpp#L48C8-L48C8

I will give it a look.

Interesting that the same config was pass to COMETAdapter and it works

If I'm following correctly, I would assume that this is because Comet (I think MSFragger as well) adds the numeric value of variable mods to that of fixed mods - as in it applies a final delta mass of V+F. Sage applies one and only one modification to a residue (they are not additive), so the full delta masses need to be specified for every mod.

For example, comet might expect to have +57 fixed and -17 variable for CAM/pyro-CAM. Sage would expect +57 static and +40 variable, and those are the values that will appear in the modified peptide sequences

This has been solved in the following PR in OpenMS OpenMS/OpenMS#7080