syntax errors in few .yaml files
Closed this issue · 6 comments
While reading .yaml
files for indexing with Elasticsearch, I found syntax errors in genes
fields of 8 models similar to the following errors:
iSyn731/reactions.yaml:
- genes: (sll1102 and sll1103and sll1104)
+ genes: (sll1102 and sll1103 and sll1104)
iRS1597/reactions.yaml:
- genes: AT2G28850 or AT2G28860 or or AT2G34490 or AT2G34500
+ genes: AT2G28850 or AT2G28860 or AT2G34490 or AT2G34500
git-diff
for my modifications produces more than 200 lines, I can copy-paste them here or make a pull request but I'm not sure with many of my changes, for example whether the following added operator should be and or or:
iCB925/reactions.yaml:
- genes: ( Cbei_0661 Cbei_2182 )
+ genes: ( Cbei_0661 or Cbei_2182 )
I also have noticed that all models do not have unique names defined. e.g. name SBML is used in about a dozen of models.
Hi @uludag,
Thank you for brining this to our attention. After reviewing the two examples that you have provided of the gene logic errors it appears that these were errors in the original published SBML versions of the models. When importing these errors in missing logic or the missing spaces are difficult to detect and correct in an automated fashion and as a result were carried over to the YAML versions of the models.
If it would not be too much trouble opening a pull request for your gene associations changes would be very helpful in correcting these errors.
For the errors that have missing operators we will perform a comparison to the available literature associated with the models to determine what the appropriate operator would be and incorporate the changes accordingly.
Thank you for pointing out the name property as well. By default when importing a model using the PSAMM importing functions if the original SBML file does not define a model name then the name property in the model.yaml file is just printed out as SBML. When indexing this could cause problems in differentiating the models. We will consider a change to the import functions to put the default name as something more specific such as the original SBML file name to help with this. Instead of 'name: SBML' the model.yaml would have something like 'name: iCac802' (For that specific model'
Thank you again for pointing this out. We will work on updating both the model collection for immediate use and improving the importing functions for future use using this information.
Best,
Keith Dufault-Thompson
Hi Keith,
Thanks for returning to my comments.
I made a pull request with changes that looked obvious!
#26
Other than above changes
- There is a missing operator between genes in reaction FDXNRy of model iCB925
- genes: ( Cbei_0661 Cbei_2182 )
+ genes: ( Cbei_0661 or Cbei_2182 )
- Model iMA945, reactions with ids ALPATE160pp and ALPATG160pp do not have
equation
fields defined
--mahmut
Hi Mahmut,
We have reviewed the issues noted with the gene associations in the models and have made the appropriate changes on a new branch in the repository called: '08-2017-model-corrections', which you can checkout and use for your analysis.
This branch contains the changes to the model names to replace the generic model names that were imported as just SBML with the appropriate model IDs.
We have also reviewed and incorporated the noted gene association changes into this branch for the models you noted after consulting the original literature. These errors appear to have been caused either by errors in the original models or were introduced during the SBML conversion process in the original publications.
We will leave this branch with these changes as a separate branch for now with these changes incorporated. We plan on performing additional systematic review of these models to identify and investigate any additional problems with the models before integrating them into the main repository.
If you have any additional questions feel free to contact us.
Thank you,
Keith Dufault-Thompson
Hi Keith,
I have checked out the new branch and tested my index script with your updates;
sbml.SBMLWriter.write_model()
calls in my script now return successfully for all the models except model sco2013(S_coelicolor_fixed). With this model 116 compounds have undefined compartment
fields when the write_model()
function is iterating over compounds in reactions equation
properties. Since I am not able to fix the model I made a modification in my local write_model()
function to check whether compound.compartment is not None
before making the _make_safe_id(compound.compartment)
calls. This required another change further in the same function to check species.compartment is not None
before setting compartment tags.
--mahmut
Hi Mahmut,
Looking at that model it was imported from another model collection associated with opencobra. It does not look like the original model from that collection has complete information for many of the compounds and reactions in the model. This import includes as much information as possible from that original file but without manually editing the reactions it is not possible to identify what compartments each compound should be in based on that source file.
I would suggest excluding that particular model from your analysis because of the missing information unless it is absolutely required. The information in the model could still be used in some analyses but the large amounts of missing information that is not defined in the SBML file makes it difficult to use in conjunction with the other models in this collection.
The SBML file for that model can be found here: https://github.com/opencobra/m_model_collection/tree/2d3d0ab5115f4fca6b4a2cdb756d73586a510e5e
Best,
Keith
Hi Keith,
Thanks for your comments and for the link. I was trying to index few metabolic network repositories and started with PSAMM model collection as the first repository. One possible use of such indexes would be to search and find occurrences of reactions or compounds in metabolic networks, similar to MetaNetX compound and reaction info pages that show such results. I was not doing any analysis, I today have noticed the tutorial in your psamm-demo repository and will try to follow it to learn the concepts in metabolic modelling and pathway analysis.
--mahmut