MetabolicAtlas/standard-GEM

External standard subsystem for each reaction to make it comparable from different GEMs

Opened this issue ยท 10 comments

Here, I have an issue in model comparison. It is that the subsystem of one rxn from different models are different. I think for standard-GEM, the subsystem should be compared. Currently, the GEMs from modelSeed have a series of standard subsystem, which make it easy for the model life.

@hongzhonglu It does make sense to standardize subsystems and make them comparable across GEMs.

The majority subsystems of available GEMs were adapted from the names of KEGG pathways. How similar between the modelSeed subsystems and KEGG pathways?

While some standardization of this could indeed be useful, I would argue that this is outside of the scope of standard-GEM. People will have arguments to use different ways of naming subSystems, and most likely none of these naming systems will be satisfactory to all. Instead, one can annotate reactions to e.g. KEGG pathways (should be added to COBRAToolbox similar as opencobra/cobratoolbox#1591), are modelSeed subsystems also in identifiers.org?

We are also not instructing what reaction identifiers to use. Allowing flexibility in standard-GEM (unless there are clear standards, like identifiers.org annotations) is part of its appeal.

I agree with @edkerk - the focus of standard-GEM at the moment essentially stops at the file tree of a repository. There have been many approaches targeting standardization of specific file formats, and the content of the respective files. Maybe this is a place where standard-GEM could contribute at some point in the far future.

Subsystem comparison sounds like a nice tool for a website. @hongzhonglu I hope you don't mind that we borrow this idea for the roadmap of Metabolic Atlas. Such feature requests are very appreciated over at the Met Atlas repository.

the focus of standard-GEM at the moment essentially stops at the file tree of a repository.

This is something missing before, and should be moved to somewhere more obvious (README or issue template) for contributors.

The issue template is "locked" because whatever issue template is defined on the main branch is used both for the creation of new issues, and also passed on to people who Use this template to set up their own repository. Similarly, README.md is set up for templating purposes.

Even though this specific issue is something we cannot focus on at the moment, I still think it's valuable to have an open discussion. Therefore I would propose to not "exclude" such issues, or close them, but to keep them in the Backlog.

@Hao-Chalmers how about creating a new issue that described the current focus of standard-GEM, and pin it at the top of the Issues page?

how about creating a new issue that described the current focus of standard-GEM, and pin it at the top of the Issues page?

Great idea

@hongzhonglu I wonder if you have any ideas or plan for standardizing subsystems, are you suggesting to adopt the way modelSeed?

Hi Hao, not sure whether modelSeed is used. It is better to also use KEGG, metacyc?

To answer @hongzhonglu's original question, let me take one step back and describe how the subsystem information has been stored in SBML before proposing a more structured way of dealing with them.

Originally, subsystem information was written into the reaction's notes, resulting in the redundant mentioning of the same subsystem in every reaction belonging to that subsystem. The new release of the BiGG Models database in 2015 introduced a new approach to redefining the use of subsystems in SBML. BiGG then used the groups extension from SBML to declare a group of reactions for every subsystem. Those groups contain members, each of which points to a reaction within that subsystem. With this approach, each subsystem was only specified once, and at the same time, one reaction could now belong to multiple subsystems.

As an additional advantage, we can annotate every group that represents such a subsystem. This means that we can add references in the form of controlled vocabulary terms to the subsystems and refer to KEGG and diverse other pathway databases. With this, it is possible to identify canonical pathways or subsystems across multiple models.

Furthermore, the idea of separating annotations from models (see https://doi.org/10.1093/bib/bby087) allows us to even store subsystem annotations in an external glossary file. Maybe those techniques can be helpful to solve such problems and to make models more comparable. Let's suggest using SBML groups and annotate them, possibly in separate glossary files.

@draeger sounds a great idea by separating annotations of models in separate glossary files.