[BUG] js_of_ocaml is excessively memory hungry
JasonGross opened this issue · 20 comments
Describe the bug
js_of_ocaml
is very cool! I use it on CI to generate a webpage. However, I cannot use it on the new GitHub Actions arm64 MacOS boxes, which have only 7 GB of RAM, because it sometimes eats 8--9 GB RAM to generate a single .js file. For example here is a table of build times and memory usages on linux:
Time | Peak Mem | File Name
---------------------------------------------------------------------------
15m11.31s | 8904232 ko | Total Time / Peak Mem
---------------------------------------------------------------------------
4m46.85s | 8808932 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.js
4m37.20s | 7151532 ko | ExtractionJsOfOCaml/fiat_crypto.js
4m30.42s | 8904232 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js
0m25.80s | 3195952 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.byte
0m25.32s | 3193600 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.byte
0m24.60s | 2814364 ko | ExtractionJsOfOCaml/fiat_crypto.byte
0m00.38s | 102780 ko | ExtractionJsOfOCaml/bedrock2_fiat_crypto.cmi
0m00.37s | 98468 ko | ExtractionJsOfOCaml/fiat_crypto.cmi
0m00.37s | 102360 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.cmi
It is similar on mac, and a bit better on debian sid.
I invoke it with --source-map --no-inline --enable=effects
and invoke the compiler with -package js_of_ocaml -package unix -w -20 -g
For the near future (until artifacts expire), the build artifacts page contains generated .js files (fiat-html-js-of-ocaml), .ml source files (ExtractionJsOfOCaml-source-master), and compiled files (ExtractionJsOfOCaml-master-ocaml-4.11.1).
Expected behavior
I expect there to be a way to make the js_of_ocaml pipeline fit in under 7GB of RAM, possibly with a flag, if necessary.
Versions
js_of_ocaml 5.7.2, ocaml 4.11.1
--no-inline
is no longer necessary (since js_of_ocaml.5.7.0).- dealing with debug info seems to be slow in your use case. Removing
--source-map
should speed up your build. I'll try to investigate. - adding
--disable globaldeadcode
should give you a good speedup as well.
With the change mentioned, I the following for ExtractionJsOfOCaml/bedrock2_fiat_crypto.js
112.97user 1.26system 1:54.24elapsed 99%CPU (0avgtext+0avgdata 4581104maxresident)k
0inputs+21904outputs (0major+1275732minor)pagefaults 0swaps
@hhugo Regarding this, I have been working on the sourcemap slowness and will open a PR today.
@JasonGross, any luck with #1614 ?
I have not had a chance to try it, but if it works on your end, I don't see why it would be any different on GitHub Actions. (The files I linked to are the ones I actually use, not simplified examples .). But I can set up GHA to use the PR. Should I just clone the repo and opam pin add .
on that branch?
#1617 may help too if the memory consumption happens to be in sourcemaps. I find that it nearly halves the peak memory usage when linking JSOO using itself.
@JasonGross, I think you can just do
opam pin add js_of_ocaml-compiler https://github.com/ocsigen/js_of_ocaml.git#speedup
I've set it up on CI:
pre (specifically here)
post speedup (mit-plv/fiat-crypto#1922) (#1614) (still in progress)
post optim_sourcemap_link (mit-plv/fiat-crypto#1923) (#1617) (still in progress)
Not all CI jobs have been updated.
Here is what I see for #1614
1m28.92s | 4461616 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js
And for #1617
4m47.42s | 5253024 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js
compared to
6m08.88s | 7720996 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js
@OlivierNicole, I would expect your PR to only affect separate compilation during the link step but I don't think separate compilation is involved here. What part of your PR would improve the situation during whole program compilation ?
Not all CI jobs have been updated.
Here is what I see for #1614
1m28.92s | 4461616 ko | ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.js
I didn’t quite follow which of the many jobs to inspect to find the info, but I trust that your numbers are right.
@OlivierNicole, I would expect your PR to only affect separate compilation during the link step but I don't think separate compilation is involved here. What part of your PR would improve the situation during whole program compilation ?
I’m honestly not sure. Looking into it.
I switched to `Stringlit
and Yojson.Raw
(rather than `String
and Yojson.Basic
) because it saves a non-negligible amount of time on the parsing and the writing of the mappings
fields of source maps (essentially, Yojson.Basic.to_string (`String s)
checks for special characters or Unicode code points in the string, which takes a suprising amount of time and is unnecessary on mappings since they contain only base64 numbers, commas and semicolons.
I’m not sure it explains it all though. Trying to profile locally.
Are these tests runnable on Linux?
Yes. The cheapest way to run them is to download any of the artists labeled ExtractionJsOfOCaml-source* from our CI. These artifacts contain a handful of self-contained .ml files, the ones that we want to turn into .js files. I gave the flags I use in the initial post.
The expensive way to run the tests is to clone the repo, do opam install coq
, and then run something like make js-of-ocaml
I can’t reproduce a significant difference in terms of run time nor profile between master and #1617. The time spent on source maps is negligible compared to the time spent optimizing. I’m starting to suspect that the CI run times have a huge variance.
P.S. I’ve done the test on with_bedrock2_fiat_crypto. Peak memory usage is not significantly affected, either.