ElucidataInc/ElMaven

compound name suffixes

chubukov opened this issue · 5 comments

At some point, compound names from user-supplied compound databases started getting a suffix like "(1)" appended to them. This causes issues with some of our workflows that rely on matching by compound name.

What causes this? Is there a way to disable it?

@chubukov This happens when El-MAVEN finds two or more compounds that have the same combination of name, id and database-name but have some difference in other attributes (e.g. category, collision-energy, etc.).

We had to do this to preserve the exact pairing between a peak-group and a compound when restoring from emDB sessions. It also allows users to distinguish between two peak-groups of compounds with the same name and ID when they are using spectral libraries (for MS/MS) which have many different fragmentation spectra for the same compound (at different CEs).

As of now, there is no way to disable it.

@saifulbkhan thanks. Is the original name stored somewhere, or can it be re-generated in a well-defined way? Could we do this during export?

Of course we could do s/\s*\(\d+\)$// at the tail end, but perhaps there's something more systematic.

@chubukov Yes, the original name is still stored in the Compound object (in a property called originalName). It is also saved in the emDB, at least for the recent versions. So if one uses them as their base for export, they can extract the compound name exactly as it was in the originally supplied database.

Do you want me to make this change in the custom export script?

@saifulbkhan ok, that sounds good. Assuming it's really just pulling out the originalName property, I should be able to make the changes.

Thanks

@chubukov Alright. One note - if you are going to extract this value from emDB, you will need the original_name (and not originalName) column from compounds table.