sirius-ms/sirius

SIRIUS 6: No Id field in the output summary files

Closed this issue · 5 comments

Dear SIRIUS development team.

Thanks a lot for developing SIRIUS!

I tested the new SIRIUS 6 last night. I did a compound annotation task
using more than 3000 .ms files of MS2 data obtained from MassBank.

I tried to obtain search results from the output summary files and found
that the structure_identifications.tsv and all other files does not have "id" field.
The compound_identifications.tsv file generated by SIRIUS5
included the "compound" information in ".ms" file, like

compound MSBNK-BGC_Munich-RP000301
in the "Id" field, which was very convenient and essential to retrieve the search result.

It would be my great pleasure if this point is considered in the next update.

Thanks

Fumio

Hi,
there should be two id columns:

  • alignedFeatureId is the internally used ID and uniquely identifies the corresponding feature. The number is currently a bit long and unreadable, though, cause its an unique identifier.
  • there should be also a field named mappingFeatureId that contains an "external" feature ID. Whenever you import data from outside (e.g. from an MGF file) it should use the ID given in that file. For MGF this is the value of the FEATURE_ID field

It seems that the name in the .ms file is currently not used for the mappingFeatureId, which is likely a bug we can fix with the next patch.

G'day @kaibioinfo Kaibio
I am using MSP as an input and got the same formulaId, alignedFeatureID and mappingFeatureID corresponding to sirius ID. Which modification you foresee in the MSP file to retain the external ID in the export summary.
Many thanks in advance.
MSP screenshot attached
image

Cheers,

Dear SIRIUS development team,

Following is an example of a query .ms file

MSBNK-RIKEN-PR300706.ms

compound MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU
ionization [M+H]+
parentmass 182.0812
ms2
58.003700 1.0
60.992100 1.0
61.987400 2.0
63.007200 1.0
74.014800 2.0
91.053900 5.0
93.020500 1.0
95.049200 3.0
95.984200 1.0
104.044900 1.0
105.034400 1.0
119.048700 11.0
123.044200 14.0
129.089300 2.0
136.075000 136.0
141.989600 1.0
147.043300 44.0
148.047600 7.0
155.058800 1.0
163.085000 1.0
165.054000 999.0
168.066000 1.0
173.080700 1.0
182.080700 584.0
184.085600 2.0

SIRIUS 5 produces "compound_identifications.tsv" file whose "id" field includes the file name and the >compound information of the query .ms file. It would be very helpful if SIRIUS 6 could do the same thing.

confidenceRank structurePerIdRank formulaRank #adducts #predictedFPs ConfidenceScore CSI:FingerIDScore ZodiacScore SiriusScore molecularFormula adduct InChIkey2D InChI name smiles xlogp pubchemids links dbflags ionMass retentionTimeInSeconds id featureId

1 1 1 1 1 0.9999893168023705 -352.9511203368887 N/A 79.72562984755662 C13H20BN3O5 [M + H]+ VDHUVZLEHHHMRS InChI=1S/C13H20BN3O5/c15-7-3-1-2-4-8-16-13(18)11-6-5-10(14(19)20)9-12(11)17(21)22/h5-6,9,19-20H,1-4,7-8,15H2,(H,16,18) Nahapba B(C1=CC(=C(C=C1)C(=O)NCCCCCCN)N+[O-])(O)O 1.10200095 133488 MeSH:(133488);PubChem:(133488);PubMed 70 309.1597543 NaN 4229_MSBNK-RIKEN-PR300706_0_MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU N/A

Thanks!

G'day @kaibioinfo Kaibio I am using MSP as an input and got the same formulaId, alignedFeatureID and mappingFeatureID corresponding to sirius ID. Which modification you foresee in the MSP file to retain the external ID in the export summary. Many thanks in advance. MSP screenshot attached image

Cheers,

In the .mat, which is an extension of .msp format the field that corresponds to an feature id is called PEAKID.
Long story short, if you add a PEAKID field to your features in .msp/.mat files, SIRIUS should treat them as mappingFeatureID.

Dear SIRIUS development team,

Following is an example of a query .ms file

MSBNK-RIKEN-PR300706.ms

compound MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU
ionization [M+H]+
parentmass 182.0812
ms2
58.003700 1.0
60.992100 1.0
61.987400 2.0
63.007200 1.0
74.014800 2.0
91.053900 5.0
93.020500 1.0
95.049200 3.0
95.984200 1.0
104.044900 1.0
105.034400 1.0
119.048700 11.0
123.044200 14.0
129.089300 2.0
136.075000 136.0
141.989600 1.0
147.043300 44.0
148.047600 7.0
155.058800 1.0
163.085000 1.0
165.054000 999.0
168.066000 1.0
173.080700 1.0
182.080700 584.0
184.085600 2.0

SIRIUS 5 produces "compound_identifications.tsv" file whose "id" field includes the file name and the >compound information of the query .ms file. It would be very helpful if SIRIUS 6 could do the same thing.

confidenceRank structurePerIdRank formulaRank #adducts #predictedFPs ConfidenceScore CSI:FingerIDScore ZodiacScore SiriusScore molecularFormula adduct InChIkey2D InChI name smiles xlogp pubchemids links dbflags ionMass retentionTimeInSeconds id featureId

1 1 1 1 1 0.9999893168023705 -352.9511203368887 N/A 79.72562984755662 C13H20BN3O5 [M + H]+ VDHUVZLEHHHMRS InChI=1S/C13H20BN3O5/c15-7-3-1-2-4-8-16-13(18)11-6-5-10(14(19)20)9-12(11)17(21)22/h5-6,9,19-20H,1-4,7-8,15H2,(H,16,18) Nahapba B(C1=CC(=C(C=C1)C(=O)NCCCCCCN)N+[O-])(O)O 1.10200095 133488 MeSH:(133488);PubChem:(133488);PubMed 70 309.1597543 NaN 4229_MSBNK-RIKEN-PR300706_0_MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU N/A

Thanks!

We decided to implement a fallback mappingfeatureId with similar information than in SIRIUS 5. This will be written to the summaries in case not other mappingfeatureId has been given in the input