ethanhs/python-wasm

Use stdlib zip bundle to reduce size of python.data

tiran opened this issue · 4 comments

tiran commented

Fast hack:

mkdir - builddep/wasi/usr/local/lib/
cd Lib
zip -0 -r ../builddep/wasi/usr/local/lib/python311.zip *.py asyncio concurrent email encodings collections html http importlib logging multiprocessing sqlite3 urllib wsgiref xml zoneinfo
emcc -o python.html Programs/python.o libpython3.11d.a -ldl -lm --preload-file usr/

This reduces python.data to about 9 MB. I have to use zip -0 because my build env does not have zlib for wasm in its sysroot. For a production build we want to create and include __pycache__ as well as compress the file.

I was actually just playing with this! With __pycache__ it seems to be 60MB :/

For zlib with emscripten we can set ZLIB_CFLAGS=-s USE_ZLIB (emscripten docs https://emscripten.org/docs/compiling/Building-Projects.html#emscripten-ports) and I think that will work.

Also I should mention that emscripten has a -s LZ4=1 which runs LZ4 on the data file, which cuts the non-zip'd data roughly in half.

I'm about to push with some changes to how my scripts build things to a) use an out of tree build and b) start hacking away at the build artifacts to only include what is needed

EDIT: pushed the changes I've made so far to limit size, I'm going to play more with zipping Lib/ more next :)

tiran commented

To reduce the size further, remove

  • test
  • tkinter
  • turtledemo
  • encoding/*.py and only keep the compiled pyc files.
  • additional pyc files for -O and -OO: *.opt-1.pyc *.opt-2.pyc. I don't think anybody will run wasm builds with extra CLI options for now.

Added a config for it here: fa70804

I think some other modules could probably be removed as well, like distutils, but this gets the data file down to 20MB. I also removed LZ4 since that won't help much if we're already zipping everything at the highest setting