finos/legend-sdlc

SDLC performance degradation due to persistence format strategy

akphi opened this issue · 0 comments

akphi commented

SDLC (as of project structure version 8) starts to store entities in grammar text. This has many upsides but somewhat punish performance heavily.

  • In terms of write, it does multi-parsing and composing: for example, it, first, tries to ask if an entity is parsable, it composes then parses the entity and do a roundtrip comparison to make sure parsing is properly done. If yes, it proceeds with parsing, otherwise, it stores the entity in JSON, and this means when we read the entity, we have to look up where the entity could have been stored based on its format (please fact-check this).
  • In terms of read, it parses the entity from text to JSON, which doubles up the size due to source information being returned as well. Also because of this, entities being published to metadata server (a.k.a depot) also contains source information which really blows up the size of models to 100% if not 200%!

Therefore, we could consider:

  • Introducing a mechanism for SDLC server to prune source information while parsing grammar text
  • Introduce a mechanism for SDLC to store entities in JSON only, maybe a config at project structure level.