sourcey/moxygen

OOM issues

Closed this issue · 0 comments

eLod commented

unfortunately my last pr #33 broke quite a few things:

  • --classes no more generate all the classes
  • memory usage is very high as it is keeping all the parsed xml structures in memory

i looked around and identified the bug with the first issue (it is about having compounds with same name but different kind), but i'm not sure how to progress on the second one. i have xmls around ~250MB (around 2500 files) and i tested just keeping all the xml structure in memory leads to 3GB memory usage (only xml2js), so holding everything in memory probably wont work.

for context: when generating with --groups the ref links needs to "know" where to point to (e.g. if and which external markdown file will include the referenced compound). the problem with that, is the group "hierarchy" is established while parsing the compound xml, e.g. class A may reference class B (or a function inside) while class B is not yet parsed (and thus the group it belongs to not known yet - edit: well to be precise the group hierarchy information is coming from the group xml, not the class xml). (note --classes does not have the same problem, as all the needed information is already parsed from index)

i can think of two alternatives currently:

  • parsing the files in 2 turns, first only parse index and the needed information to build group hierarchy, then process all the compounds again and create the markdown representation. the downside of this is that though the 2 turns is only needed when using --groups, that is not really a parser concern
  • reverting the parsing and markdown generation to before my pr and placing some marker inside the generated markdown content for anchors, so only anchors can be replaced in a second run. the downside of this is it feels kind of duct taping

do you have any preference, or other ideas?