zarr-developers/community

WebAssembly implementation of Zarr

jakirkham opened this issue ยท 26 comments

Would be really nice to have a WebAssembly implementation of Zarr. This could make it possible to load Zarr files in the browser for viewing or for computation. Could also be useful to be able to work with in-memory Zarr objects in the browser. Given there has already been some good work to get Python and NumPy into the browser using Emscripten, it may be possible to just run the Python Zarr implementation in the browser. Though other compression algorithms not in the standard Python library will require getting Blosc and Numcodecs into WebAssembly. ( Blosc/c-blosc#238 )

Note: WebAssembly is well supported. For older browsers one can convert WebAssembly to asm.js, which is pretty well supported. In the worst case, asm.js is still valid JavaScript. So can be run as JavaScript (albeit slowly).

It's worth noting that Rust can compile to WebAssembly. This is a builtin feature of the Rust compiler. There is an N5 implementation for Rust, which could be a good place to start.

cc @aschampion

I'm planning on creating a simple WASM-compatible flag (or new dependent crate) for the rust N5 implementation at some point. We plan to use this to improve viewing N5 volumes in CATMAID. Initially this should be as simple as adding a new backend to use the fetch API instead of filesystem and disabling compression modes that don't compile with WASM.

Related: https://gitter.im/zarr-developers/community?at=5ddc7f1c55bbed7ade461091
https://github.com/gzuidhof/zarr.js

Not WASM but aims to make zarr files accessible from the browser

Also noting there is a Scala implementation that can be compiled to JavaScript. Relevant discussion and more info in issue ( #15 ). That said, I'm not aware of a good path from Scala to WebAssembly.

I'd be interested in contributing to a rust and wasm implementation of zarr. Would anyone like to collaborate on this?

A wasm implementation of Zarr-Blosc decompression is here:

https://github.com/Kitware/itk-vtk-viewer/blob/7f82bbff02b6e8d847c76457fc07979be07c7ad5/src/bloscZarrDecompress.js

If there is interest, this could be separated out into a new package, and the corresponding compression function added (it has already been implemented in C/Emscripten). It would make sense to use this as the decompression for JavaScript / Typescript libraries like @gzuidhof 's zarr.js or @freeman-lab 's zarr-js.

This implementation supports all blosc codec's. It also uses a pool of web workers to decompress a set of chunks in parallel and optimize wasm compilation.

Here is what it looks like in action:

https://kitware.github.io/itk-vtk-viewer/app/?fileToLoad=https://thewtex.github.io/allen-ccf-itk-vtk-zarr/average_template_50_chunked.zarr

@thewtex It would be awesome to make it a separate package. I am new to WebAssembly, but I'd be happy to contribute where I can.

Wow, that is mega cool. What is it?

A brain atlas averaged from 1675 mice ๐Ÿญ ๐Ÿญ ๐Ÿญ

@thewtex It would be awesome to make it a separate package. I am new to WebAssembly, but I'd be happy to contribute where I can.

@vdwees Great, we'll create a package, your help is appreciated.

manzt commented

I just made numcodecs.js public which has a WASM blosc codec. Hopefully this will help others use blosc in their applications!

EDIT: It's a javascript module meant to be run in the browser and Node.

I just made numcodec.js public which has a WASM blosc codec. Hopefully this will help others use blosc in their applications!

Nice work! Thanks for sharing @manzt. I wonder how hard it is to get Zarr usable from WebAssembly then (as it is pure Python at that point)

cc @rth @mdboom (who may be interested ;)

rth commented

I wonder how hard it is to get Zarr usable from WebAssembly then (as it is pure Python at that point)

If it is pure python (and has pure python wheels) you could install it from PyPi with pyodide, but you would still need to write some code to interact with those JS/WASM libraries where currently it uses other Python package with C-extensions..

Hey there, i am very interested by the zarr format so i am available to create a WebAssembly/Rust lib but i would like to directly implement the v3 spec. After reading some of the differents topic of the zarr spec repo the spec for the v3 seems pretty great. Do you think i could start some implementation ? Or should i wait for the python impl first ?

Update: Sorry but I cannot work on this project due to some terms of my employment contract ๐Ÿ˜ž

Hey @Farkal, welcome! That sounds great! ๐Ÿ˜„

Would be nice to have other people trying out the spec in other languages. This can help inform whether what we have in the spec makes sense or if it needs further modification. FWIW there is a WIP Python implementation here ( https://github.com/alimanfoo/zarrita ). Also we have been engaging with some folks from QuantStack on the C++ side and with NetCDF on the C side. It would be really interesting to see whether things makes sense from the WebAssembly/Rust side.

Also we have a weekly spec meeting details in issue ( #33 ) if you would be interested in stopping by. Would be nice to say hi and learn a bit more about what you are working as well as how we can help ๐Ÿ™‚

oeway commented

FYI: zarr-python and numcodecs are compiled into WASM and available as modules in Pyodide after this PR.

Click here to try a live demo with Pyodide + zarr running completely in the browser.
(Only works with Chrome or FireFox)

A next goal is to add a custom storage backend for pyodide such that we can load zarr arrays via http. However, due to the browser limitations, we cannot use fsspec with its http backend directly. To enable this, we are currently working on the asyncio event loop, and we will likely also need to wait until we have the multi-threading supported in Pyodide.

Very cool! Thanks for sharing Wei ๐Ÿ˜„

cc @martindurant (who may be interested in fsspec usage)

This was discussed a bit on gitter. fsspec for Pyodide seems like a big benefit with or without zarr, but indeed the sync/thread stuff adds complexity that in this environment I don't think I'm in the best place to tackle. Happy to help test, though!

It looks like Pyodide is including Zarr & Numcodecs, which is cool to see ๐Ÿ˜„

@jakirkham : where does that leave this issue? :)

oeway commented

It looks like Pyodide is including Zarr & Numcodecs, which is cool to see ๐Ÿ˜„

I added that two libraries to pyodide a while ago, it works with in-memory data but still very limited for any real application because we cannot support remote storage backends.

Not sure if this is discussed already in the zarr community, but the key feature to make that work is to support async store (with asyncio). The native python implementation of fsspec uses threading to convert async calls into sync, but multi-threading in pyodide is not supported yet, it will only work if zarr supports async store (meaning the getitem function will be async).

Working on async zarr at https://github.com/martindurant/async-zarr as part of a company hack week

Already works in normal python asyncio, and maybe works in pyscript too, just need to write some HTML or something...

Working on async zarr at https://github.com/martindurant/async-zarr as part of a company hack week

Thanks for sharing the details, @martindurant. May I know the exact dates for the hack week?
I'd like to post publicly about this to invite more contributors.

IIUC it is a hack week Anaconda is running for its employees

That's correct; and the hack is now over.
I made this little video of the current state: https://drive.google.com/file/d/1Ll-Lr_3Ckf_-WIlBkIPx4H8Kmz9lz4b9/view?usp=sharing

Thanks a lot, @martindurant. This is great.
I'll share this across our social media to look at so that we can get the word out and invite new users/contributors.