draeger-lab/ModelPolisher

Rewriting BiGGId handling

Closed this issue · 1 comments

BiGGId parsing was rewritten as the old method had shortcomings when parsing ids with a large amount of underscores and pseudoreaction ids were handled wrongly, i.e. they got an "R_" prefix added that was not in accordance with the specification.
The new method is regex based and while the major part of the rewrite is finished, some smaller issues remain.

  • Ids that cannot be interpreted as BiGGIds are currently stored as abbreviations internally so they are not lost and corrected as much as possible to represent a valid SBase identifer, however identifiers starting with digits are not handled correctly yet -> prepend underscore?
  • Invalid character replacement is done in a manner that produces ids that should match BiGGDB entries, however this needs to be verified by test cases
  • Passing empty ids is not caught in all places, annotating species produces a "M_M" id across most (all?) models. Needs some investigation, as I haven't found the place where it happens yet. Maybe throw an exception and print the stacktrace when this happens for debugging
  • Running the new code shows duplicate entries for some pseudoreactions in bigg_models_data, once with and once without the "R_" prefix. I'll open a respective issue with them once I have the data for which models and reactions this occurs
  • The new implementation is written in a way that we could actually make adding prefixes to ids a user choice, as proposed in #28 without much extra work
  • In some models a compartment code prefix "C_" is found - should this be just stripped or added as an additional prefix that is not part of the specification?

This should be done for now, might reopen if issues creep up after the beta release