CZI Essential OSS Cycle 1
alimanfoo opened this issue ยท 19 comments
This issue is for surfacing discussion and coordination towards an application to the CZI Essential OSS funding call.
The deadline for the first cycle is very soon (1 Aug) so we may not be able to pull everything together by then, but whatever we can do will help even if we decide to apply for the second cycle instead.
There are a number of key issues that we need to work through in framing the proposal. These include (1) what work/activities do we want to fund, (2) who can do the work, and (3) financial logistics.
cc @zarr-developers/core-devs
Regarding financial logistics, general feeling so far has been that applying for numfocus fiscal sponsorship would be the best route, then the grant goes to numfocus and they could disperse funds to one or more groups. I've opened #21 to coordinate a numfocus application. We won't get that done in time for the 1 Aug deadline, but perhaps it might be enough if we have got the process underway.
Regarding what work/activities we want to fund, there has been some consensus around the following work packages, although comments/thoughts very welcome (ultimately this should all go into a roadmap document):
-
Finalize and publish the core protocol v3 spec.
-
Implement the v3 core protocol in multiple languages. Aiming for Python, JVM, C++, Julia and the NetCDF C library is not unrealistic given these all have some existing implementation work. Any other language implementations would also be very welcome.
-
Develop and publish specs for a set of core compression codecs, which are codecs we think all zarr protocol implementations should aim to support (e.g., zlib, zstd, lz4, blosc, ...).
-
Add support for core compression codecs in all Zarr implementations.
-
Develop and publish specs for a set of core storage implementations, which are storage systems that we think all zarr protocol implementations should aim to support (e.g., file system, S3, GCS, ABS, ...).
-
Add support for core storage layers in all Zarr implementations.
-
Develop a community process and supporting systems for adding, revising and publishing specs, including the core protocol, codecs, storage layers, and protocol extensions.
-
Do stuff to help support and grow the zarr developer community, including both core developers and developers working on extensions and/or libraries that build on zarr.
-
Do stuff to help support and grow the zarr user community.
-
Do other stuff to improve sustainability of the zarr project, e.g., develop a roadmap, establish a governance process, ...
It's probably worth adding that once we do have the new spec and implementation out, we could benefit from general maintenance (e.g. answering questions about the spec, coaching developers in implementing spec extensions, engaging with domain specialists on how to store their data, etc.).
Regarding how we get this work funded and who does the work, one of the key difficulties we've been discussing is the fact that with a 1 year grant it's not easy for any of the groups currently contributing to zarr to recruit new people. Also, everyone has a number of existing commitments, so it's also not easy to reallocate existing people even if funds are available. That may have changed, and so if you or someone in your group (a) would like to contribute to zarr, and (b) could be freed up to work on zarr if grant funds were available, please let me know.
Another possibility we've been exploring is whether we could involve open source development companies and pass on funds for them to deliver specific pieces of work. There are two leading candidates in the Python sphere, @Quansight and @QuantStack.
@jakirkham and I had a brief chat with @teoliphant at SciPy about whether Quansight would be interested to work on zarr, and Travis was very positive. Quansight are developing the concept of a community work order (see recent blog post by @rgommers) which is a model for how open source projects might get specific pieces of work sponsored and delivered via a commercial org like Quansight. The people at Quansight are obviously excellent, and @jrbourbeau from Quansight has recently submitted a number of PRs against the zarr-python repo, which may just be his personal interest but I hope also indicates some interest from Quansight in the project.
Quantstack are also very highly recommended, and are responsible for developing xtensor among other great things, which @constantinpape has used within the C++ z5 project. IIRC @constantinpape said he had a number of very good interactions with the xtensor team during that work. Various people have suggested that we reach out to @SylvainCorlay to discuss whether they might be willing to take on some work.
A third suggestion has been to reach out to CZI directly to see if they would be interested to create some in-house capacity to work on zarr, given that it is likely to be central to a number of their scientific projects. This is probably not be something that would happen within the scope of the EOSS funding call, but could be a related conversation.
Any thoughts/comments very welcome.
The third option is very intriguing, also for projects like xarray which is in a similar predicament. I would be curious what CZI thinks.
Tagging @ambrosejcarr and @ttung as a heads up of the CZI mention. Additionally tagging @chris-allan & @melissalinkert from @glencoesoftware, a commercial partner of OME that could potentially get involved in Java support for Zarr.
@joshmoore @chris-allan @melissalinkert @glencoesoftware I am proposing n5 as frontend to zarr for Java, then existing ImgLib2 and ImageJ API can be used as is. Contemplating on starting it myself because some Python-centric folks around me are storing their stuff as zarr.
@axtimwalde ๐ Do you have a feel for other Java counterparts of the items in @alimanfoo's list? (#22 (comment))
file system, S3, GCS support would be straight forward because almost identical to existing n5 backends. Compression codecs as available, zlib, lz4, blosc should be fine, haven't looked for others.
Note on file systems (and this is not the obvious place for this): with the appearance of fsspec ( https://github.com/intake/filesystem_spec ) and s3fs, gcsfs, dask and intake starting to use it, that should be the way forward for cases where we don't have an explicit mapper interface. Note that fsspec already brings extra support for ftp, ssh/sftp, webhdfs.
Another possibility we've been exploring is whether we could involve open source development companies and pass on funds for them to deliver specific pieces of work. There are two leading candidates in the Python sphere, @Quansight and @QuantStack.
Quantstack are also very highly recommended, and are responsible for developing xtensor among other great things, which @constantinpape has used within the C++ z5 project. IIRC @constantinpape said he had a number of very good interactions with the xtensor team during that work. Various people have suggested that we reach out to @SylvainCorlay to discuss whether they might be willing to take on some work.
Hello, thank you for the kind words and for thinking of us. We would love to be involved and help deliver specific pieces of work, related to xtensor or not. Several of the work items listed above are very relevant to us.
Do not hesitate to contact us directly! cc @wolfv @JohanMabille.
The people at Quansight are obviously excellent, and @jrbourbeau from Quansight has recently submitted a number of PRs against the zarr-python repo, which may just be his personal interest but I hope also indicates some interest from Quansight in the project
Thank you for the interest and praise @alimanfoo. And yes, I think Zarr is an very interesting project and fits well into the set of projects and activities that we'd like to push forward with Quansight Labs. Happy to discuss further here or at rgommers@quansight.com
.
You can't really go wrong here, the @QuantStack team is awesome, and if you can get CZI or another company or lab to hire people directly, that would be fantastic news too.
Thank you @SylvainCorlay and @rgommers, very much appreciated.
Congrats on getting funded!
Hey - the Python Software Foundation got far more applicants than we can take when we put out an RfP for our CZI-funded work. Would it be ok for me to redirect a few of those folks to talk with y'all at Zarr? Is there a short role description/job posting I could point to?
Hi @brainwane, thanks a lot for reaching out, the pip rfp is fantastic, you folks are really organised!
Ryan Williams (@ryan-williams) from Mount Sinai medical school led the zarr czi proposal. Our plan was to use some of the money to fund a post at Mount Sinai and the remainder to go to an open source contractor. We are likely to use quantstack and/or quansight for the contractor part. I'm not sure yet if Ryan's plan was to recruit a new post at Mount Sinai or fund an existing post. Let me catch up with Ryan and get back to you.
Congrats on getting funded!
CZI EOSS Round 1 Interim Report โ Zarr (public).pdf
Here is the "Progress Overview" portion of the interim report we submitted to CZI last week; we got a no-cost extension to finish some of this work (grant period ending EoMarch, report due May 1), and have a decent amount of funds left.
The final report for CZI EOSS1 was submitted on August 1, 2022.
Thanks, everyone!
Closing this now.