ocsigen/js_of_ocaml

[BUG] js_of_ocaml is excessively memory hungry

JasonGross opened this issue · 20 comments

Describe the bug

js_of_ocaml is very cool! I use it on CI to generate a webpage. However, I cannot use it on the new GitHub Actions arm64 MacOS boxes, which have only 7 GB of RAM, because it sometimes eats 8--9 GB RAM to generate a single .js file. For example here is a table of build times and memory usages on linux:

     Time |   Peak Mem | File Name                                         
---------------------------------------------------------------------------
15m11.31s | 8904232 ko | Total Time / Peak Mem                             
---------------------------------------------------------------------------
 4m46.85s | 8808932 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.js       
 4m37.20s | 7151532 ko | ExtractionJsOfOCaml/fiat_crypto.js                
 4m30.42s | 8904232 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js  
 0m25.80s | 3195952 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.byte
 0m25.32s | 3193600 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.byte     
 0m24.60s | 2814364 ko | ExtractionJsOfOCaml/fiat_crypto.byte              
 0m00.38s |  102780 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.cmi      
 0m00.37s |   98468 ko | ExtractionJsOfOCaml/fiat_crypto.cmi               
 0m00.37s |  102360 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.cmi 

It is similar on mac, and a bit better on debian sid.

I invoke it with --source-map --no-inline --enable=effects and invoke the compiler with -package js_of_ocaml -package unix -w -20 -g

For the near future (until artifacts expire), the build artifacts page contains generated .js files (fiat-html-js-of-ocaml), .ml source files (ExtractionJsOfOCaml-source-master), and compiled files (ExtractionJsOfOCaml-master-ocaml-4.11.1).

Expected behavior
I expect there to be a way to make the js_of_ocaml pipeline fit in under 7GB of RAM, possibly with a flag, if necessary.

Versions
js_of_ocaml 5.7.2, ocaml 4.11.1

  • --no-inline is no longer necessary (since js_of_ocaml.5.7.0).
  • dealing with debug info seems to be slow in your use case. Removing --source-map should speed up your build. I'll try to investigate.
  • adding --disable globaldeadcode should give you a good speedup as well.

With the change mentioned, I the following for ExtractionJsOfOCaml/bedrock2_fiat_crypto.js

112.97user 1.26system 1:54.24elapsed 99%CPU (0avgtext+0avgdata 4581104maxresident)k
0inputs+21904outputs (0major+1275732minor)pagefaults 0swaps

with #1614, one no longer need to disable globaldeadcode, at the cost of extra memory usage.

78.16user 1.85system 1:20.13elapsed 99%CPU (0avgtext+0avgdata 6034576maxresident)k
0inputs+19696outputs (0major+1631738minor)pagefaults 0swaps

I'll try to improve #1614

I've updated #1614, I now get

73.86user 1.00system 1:14.87elapsed 99%CPU (0avgtext+0avgdata 4581396maxresident)k
0inputs+19696outputs (0major+1209270minor)pagefaults 0swaps

I still need to investigate the sourcemap issue.

Can you test #1614 and confirm it solves part of your issue ?

@hhugo Regarding this, I have been working on the sourcemap slowness and will open a PR today.

@JasonGross, any luck with #1614 ?

I have not had a chance to try it, but if it works on your end, I don't see why it would be any different on GitHub Actions. (The files I linked to are the ones I actually use, not simplified examples .). But I can set up GHA to use the PR. Should I just clone the repo and opam pin add . on that branch?

#1617 may help too if the memory consumption happens to be in sourcemaps. I find that it nearly halves the peak memory usage when linking JSOO using itself.

@JasonGross, I think you can just do

opam pin add js_of_ocaml-compiler https://github.com/ocsigen/js_of_ocaml.git#speedup

I've set it up on CI:
pre (specifically here)
post speedup (mit-plv/fiat-crypto#1922) (#1614) (still in progress)
post optim_sourcemap_link (mit-plv/fiat-crypto#1923) (#1617) (still in progress)

It looks both #1614 and #1617 make both the run time and the peak memory usage worse. I haven’t worked on #1614 but that surprises me a lot in the case #1617. Are these tests runnable on Linux? I may try to inspect them locally.

Not all CI jobs have been updated.

Here is what I see for #1614

1m28.92s | 4461616 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

And for #1617

4m47.42s | 5253024 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

compared to

6m08.88s | 7720996 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

@OlivierNicole, I would expect your PR to only affect separate compilation during the link step but I don't think separate compilation is involved here. What part of your PR would improve the situation during whole program compilation ?

Not all CI jobs have been updated.

Here is what I see for #1614

1m28.92s | 4461616 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js

I didn’t quite follow which of the many jobs to inspect to find the info, but I trust that your numbers are right.

@OlivierNicole, I would expect your PR to only affect separate compilation during the link step but I don't think separate compilation is involved here. What part of your PR would improve the situation during whole program compilation ?

I’m honestly not sure. Looking into it.

I switched to `Stringlit  and Yojson.Raw (rather than `String and Yojson.Basic) because it saves a non-negligible amount of time on the parsing and the writing of the mappings fields of source maps (essentially, Yojson.Basic.to_string (`String s) checks for special characters or Unicode code points in the string, which takes a suprising amount of time and is unnecessary on mappings since they contain only base64 numbers, commas and semicolons.

I’m not sure it explains it all though. Trying to profile locally.

Are these tests runnable on Linux?

Yes. The cheapest way to run them is to download any of the artists labeled ExtractionJsOfOCaml-source* from our CI. These artifacts contain a handful of self-contained .ml files, the ones that we want to turn into .js files. I gave the flags I use in the initial post.

The expensive way to run the tests is to clone the repo, do opam install coq, and then run something like make js-of-ocaml

I can’t reproduce a significant difference in terms of run time nor profile between master and #1617. The time spent on source maps is negligible compared to the time spent optimizing. I’m starting to suspect that the CI run times have a huge variance.

P.S. I’ve done the test on with_bedrock2_fiat_crypto. Peak memory usage is not significantly affected, either.

#1614 has been merged. Reopen if you still have issues