SIRIUS 6: No Id field in the output summary files
Closed this issue · 5 comments
Dear SIRIUS development team.
Thanks a lot for developing SIRIUS!
I tested the new SIRIUS 6 last night. I did a compound annotation task
using more than 3000 .ms files of MS2 data obtained from MassBank.
I tried to obtain search results from the output summary files and found
that the structure_identifications.tsv and all other files does not have "id" field.
The compound_identifications.tsv file generated by SIRIUS5
included the "compound" information in ".ms" file, like
compound MSBNK-BGC_Munich-RP000301
in the "Id" field, which was very convenient and essential to retrieve the search result.
It would be my great pleasure if this point is considered in the next update.
Thanks
Fumio
Hi,
there should be two id columns:
- alignedFeatureId is the internally used ID and uniquely identifies the corresponding feature. The number is currently a bit long and unreadable, though, cause its an unique identifier.
- there should be also a field named mappingFeatureId that contains an "external" feature ID. Whenever you import data from outside (e.g. from an MGF file) it should use the ID given in that file. For MGF this is the value of the FEATURE_ID field
It seems that the name in the .ms file is currently not used for the mappingFeatureId, which is likely a bug we can fix with the next patch.
G'day @kaibioinfo Kaibio
I am using MSP as an input and got the same formulaId, alignedFeatureID and mappingFeatureID corresponding to sirius ID. Which modification you foresee in the MSP file to retain the external ID in the export summary.
Many thanks in advance.
MSP screenshot attached
Cheers,
Dear SIRIUS development team,
Following is an example of a query .ms file
MSBNK-RIKEN-PR300706.ms
compound MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU
ionization [M+H]+
parentmass 182.0812
ms2
58.003700 1.0
60.992100 1.0
61.987400 2.0
63.007200 1.0
74.014800 2.0
91.053900 5.0
93.020500 1.0
95.049200 3.0
95.984200 1.0
104.044900 1.0
105.034400 1.0
119.048700 11.0
123.044200 14.0
129.089300 2.0
136.075000 136.0
141.989600 1.0
147.043300 44.0
148.047600 7.0
155.058800 1.0
163.085000 1.0
165.054000 999.0
168.066000 1.0
173.080700 1.0
182.080700 584.0
184.085600 2.0
SIRIUS 5 produces "compound_identifications.tsv" file whose "id" field includes the file name and the >compound information of the query .ms file. It would be very helpful if SIRIUS 6 could do the same thing.
confidenceRank structurePerIdRank formulaRank #adducts #predictedFPs ConfidenceScore CSI:FingerIDScore ZodiacScore SiriusScore molecularFormula adduct InChIkey2D InChI name smiles xlogp pubchemids links dbflags ionMass retentionTimeInSeconds id featureId
1 1 1 1 1 0.9999893168023705 -352.9511203368887 N/A 79.72562984755662 C13H20BN3O5 [M + H]+ VDHUVZLEHHHMRS InChI=1S/C13H20BN3O5/c15-7-3-1-2-4-8-16-13(18)11-6-5-10(14(19)20)9-12(11)17(21)22/h5-6,9,19-20H,1-4,7-8,15H2,(H,16,18) Nahapba B(C1=CC(=C(C=C1)C(=O)NCCCCCCN)N+[O-])(O)O 1.10200095 133488 MeSH:(133488);PubChem:(133488);PubMed 70 309.1597543 NaN 4229_MSBNK-RIKEN-PR300706_0_MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU N/A
Thanks!
G'day @kaibioinfo Kaibio I am using MSP as an input and got the same formulaId, alignedFeatureID and mappingFeatureID corresponding to sirius ID. Which modification you foresee in the MSP file to retain the external ID in the export summary. Many thanks in advance. MSP screenshot attached
Cheers,
In the .mat
, which is an extension of .msp
format the field that corresponds to an feature id is called PEAKID
.
Long story short, if you add a PEAKID
field to your features in .msp
/.mat
files, SIRIUS should treat them as mappingFeatureID
.
Dear SIRIUS development team,
Following is an example of a query .ms file
MSBNK-RIKEN-PR300706.ms
compound MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU
ionization [M+H]+
parentmass 182.0812
ms2
58.003700 1.0
60.992100 1.0
61.987400 2.0
63.007200 1.0
74.014800 2.0
91.053900 5.0
93.020500 1.0
95.049200 3.0
95.984200 1.0
104.044900 1.0
105.034400 1.0
119.048700 11.0
123.044200 14.0
129.089300 2.0
136.075000 136.0
141.989600 1.0
147.043300 44.0
148.047600 7.0
155.058800 1.0
163.085000 1.0
165.054000 999.0
168.066000 1.0
173.080700 1.0
182.080700 584.0
184.085600 2.0SIRIUS 5 produces "compound_identifications.tsv" file whose "id" field includes the file name and the >compound information of the query .ms file. It would be very helpful if SIRIUS 6 could do the same thing.
confidenceRank structurePerIdRank formulaRank #adducts #predictedFPs ConfidenceScore CSI:FingerIDScore ZodiacScore SiriusScore molecularFormula adduct InChIkey2D InChI name smiles xlogp pubchemids links dbflags ionMass retentionTimeInSeconds id featureId
1 1 1 1 1 0.9999893168023705 -352.9511203368887 N/A 79.72562984755662 C13H20BN3O5 [M + H]+ VDHUVZLEHHHMRS InChI=1S/C13H20BN3O5/c15-7-3-1-2-4-8-16-13(18)11-6-5-10(14(19)20)9-12(11)17(21)22/h5-6,9,19-20H,1-4,7-8,15H2,(H,16,18) Nahapba B(C1=CC(=C(C=C1)C(=O)NCCCCCCN)N+[O-])(O)O 1.10200095 133488 MeSH:(133488);PubChem:(133488);PubMed 70 309.1597543 NaN 4229_MSBNK-RIKEN-PR300706_0_MSBNK-RIKEN-PR300706_JDOFCMASVRMYJU N/A
Thanks!
We decided to implement a fallback mappingfeatureId
with similar information than in SIRIUS 5. This will be written to the summaries in case not other mappingfeatureId
has been given in the input