SAGE search engine score is missing after psm re-scoring using percolator
Closed this issue · 22 comments
Description of the bug
SAGE's search engine score should be hyperscore
, pyopenms could extrct it with idXMLs after searchengines
step. But after psm re-scoring using percolator, it's missing in idXMLs.
idxml before psm re-scoring:
<PeptideIdentification score_type="hyperscore" higher_score_better="true" significance_threshold="0.0" MZ="988.485223533228918" RT="2776.47440000000006" spectrum_reference="controllerType=0 controllerNumber=1 scan=16570" >
<PeptideHit score="4.372861652440132" sequence="LLGPSLTSTTPASSSSGSSSR" charge="2" aa_before="R" aa_after="G" start="363" end="383" protein_refs="PH_0" >
<UserParam type="string" name="target_decoy" value="target"/>
<UserParam type="string" name="ln(-poisson)" value="3.14389626700959"/>
<UserParam type="string" name="ln(delta_best)" value="0.0"/>
<UserParam type="string" name="ln(delta_next)" value="3.819337728782414"/>
<UserParam type="string" name="ln(matched_intensity_pct)" value="3.5826106"/>
<UserParam type="string" name="longest_b" value="9"/>
<UserParam type="string" name="longest_y" value="18"/>
<UserParam type="string" name="longest_y_pct" value="0.85714287"/>
<UserParam type="string" name="matched_peaks" value="27"/>
<UserParam type="string" name="scored_candidates" value="8222"/>
<UserParam type="string" name="protein_references" value="unique"/>
</PeptideHit>
<UserParam type="string" name="PinSpecId" value="312"/>
</PeptideIdentification>
idxml after psm re-scoring:
<PeptideIdentification score_type="Posterior Error Probability" higher_score_better="false" significance_threshold="0.0" MZ="988.485223533228918" RT="2776.47440000000006" spectrum_reference="controllerType=0 controllerNumber=1 scan=16570" >
<PeptideHit score="4.70852e-08" sequence="LLGPSLTSTTPASSSSGSSSR" charge="2" aa_before="R" aa_after="G" start="363" end="383" protein_refs="PH_7718" >
<UserParam type="string" name="target_decoy" value="target"/>
<UserParam type="string" name="ln(-poisson)" value="3.14389626700959"/>
<UserParam type="string" name="ln(delta_best)" value="0.0"/>
<UserParam type="string" name="ln(delta_next)" value="3.819337728782414"/>
<UserParam type="string" name="ln(matched_intensity_pct)" value="3.5826106"/>
<UserParam type="string" name="longest_b" value="9"/>
<UserParam type="string" name="longest_y" value="18"/>
<UserParam type="string" name="longest_y_pct" value="0.85714287"/>
<UserParam type="string" name="matched_peaks" value="27"/>
<UserParam type="string" name="scored_candidates" value="8222"/>
<UserParam type="string" name="protein_references" value="unique"/>
<UserParam type="float" name="MS:1001492" value="2.65249"/>
<UserParam type="float" name="MS:1001491" value="7.304600000000001e-04"/>
<UserParam type="float" name="MS:1001493" value="4.70852e-08"/>
</PeptideHit>
<UserParam type="string" name="PinSpecId" value="312"/>
</PeptideIdentification>
Command used and terminal output
No response
Relevant files
No response
System information
No response
How is this with other search engines?
It might be because PSMFeatureExtractor can be and is skipped with Sage.
I guess we are not taking the SAGE output but the pin file from percolator?
@jpfeuffer @ypriverol It should be SAGE seach output. Comet and MSGF+ got their search scores in MetaValue of every PeptideHit, but not SAGE.
How does an idXML for comet look like after PSMFeatureExtractor?
Comet search engine score is xcorr -> MetaValue MS:1002252
. It's already exist before psm re-scoring.
<PeptideIdentification score_type="Posterior Error Probability" higher_score_better="false" significance_threshold="0.0" MZ="474.761474031899979" RT="1815.299999999999955" spectrum_reference="controllerType=0 controllerNumber=1 scan=3727" >
<PeptideHit score="0.990159" sequence="LSGATLQMK" charge="2" aa_before="K" aa_after="R" start="48" end="56" protein_refs="PH_1080" >
<UserParam type="string" name="target_decoy" value="decoy"/>
<UserParam type="string" name="MS:1002258" value="6"/>
<UserParam type="string" name="MS:1002259" value="16"/>
<UserParam type="string" name="num_matched_peptides" value="1060"/>
<UserParam type="int" name="isotope_error" value="0"/>
<UserParam type="float" name="MS:1002252" value="1.116"/>
<UserParam type="float" name="MS:1002253" value="1.0"/>
<UserParam type="float" name="MS:1002254" value="0.0"/>
<UserParam type="float" name="MS:1002255" value="113.900000000000006"/>
<UserParam type="float" name="MS:1002256" value="11.0"/>
<UserParam type="float" name="MS:1002257" value="2.89"/>
<UserParam type="string" name="protein_references" value="unique"/>
<UserParam type="float" name="COMET:deltCn" value="1.0"/>
<UserParam type="float" name="COMET:deltLCn" value="0.0"/>
<UserParam type="float" name="COMET:lnExpect" value="1.061256502124341"/>
<UserParam type="float" name="COMET:lnNumSP" value="6.966024187106113"/>
<UserParam type="float" name="COMET:lnRankSP" value="2.397895272798371"/>
<UserParam type="float" name="COMET:IonFrac" value="0.375"/>
<UserParam type="float" name="MS:1001492" value="-0.641415"/>
<UserParam type="float" name="MS:1001491" value="0.270715"/>
<UserParam type="float" name="MS:1001493" value="0.990159"/>
</PeptideHit>
</PeptideIdentification>
But this is after rescoring. I need to see before.
<PeptideIdentification score_type="expect" higher_score_better="false" significance_threshold="0.0" MZ="474.761474031899979" RT="1815.299999999999955" spectrum_reference="controllerType=0 controllerNumber=1 scan=3727" >
<PeptideHit score="2.89" sequence="LSGATLQMK" charge="2" aa_before="K" aa_after="R" start="48" end="56" protein_refs="PH_9943" >
<UserParam type="string" name="MS:1002258" value="6"/>
<UserParam type="string" name="MS:1002259" value="16"/>
<UserParam type="string" name="num_matched_peptides" value="1060"/>
<UserParam type="int" name="isotope_error" value="0"/>
<UserParam type="float" name="MS:1002252" value="1.116"/>
<UserParam type="float" name="MS:1002253" value="1.0"/>
<UserParam type="float" name="MS:1002254" value="0.0"/>
<UserParam type="float" name="MS:1002255" value="113.900000000000006"/>
<UserParam type="float" name="MS:1002256" value="11.0"/>
<UserParam type="float" name="MS:1002257" value="2.89"/>
<UserParam type="string" name="target_decoy" value="decoy"/>
<UserParam type="string" name="protein_references" value="unique"/>
</PeptideHit>
</PeptideIdentification>
Yes so the problem is that we actually use the Comet e-value as main score.
So you are just lucky that you picked a score that is not a main score for the other search engines.
I think this problem is solved. You should take the data @WangHong007 from the SAGE id folder.
@timosachsenberg Im re-opening this PR because After testing http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD004683/percolator/ the error remains. Can you double check why the hyperscore
is not included in the percolator sage output?
Percolator SAGE output: http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/PXD004683/percolator/20150820_Haura-Pilot-TMT1-bRPLC01-2_sage_perc.idXML
according to pipeline_info it is still using the old container
This is percolator no?
yes PercolatorAdapter
I honestly think we should just override the containers for all openms labelled processes until the release. I.e. make the dev profile active by default. Otherwise someone will always forget to change a process.
I actually think we should make variable the containers using in every-process a variable, would that be possible? Something like:
openms_conda_string = "bioconda::openms=2.9.1"
openms_singularity_string = "ghcr.io/openms/openms-executables-sif:latest"
openms_docker_string = "ghcr.io/openms/openms-executables:latest"
I don't like it very much. You will just get confused because suddenly conda uses something different from docker etc. It also confuses users with yet an additional THREE parameters.
The only thing you will ever want is dev or latest. Nothing else.
I have no idea what do you have in mind? How can you make a profile default, can you send me an example and I can do it.
You mean something like this:
yes. just put it in base.config. The thing is just to remember to remove it when releasing
I was actually thinking to leave it there but then in the nextflow.config
import it or not depending on the release cycle. Like in the nextflow.config
includeConfig 'conf/dev.config'
What do you think?
Yes but you need to find out if and how nextflow knows about its release cycle ;)
If it cannot know about it, then having to change one line every release it not much better than just changing 3 lines.