Roadmap for tree-buf
Opened this issue · 6 comments
Hey @That3Percent thanks for interesting library. I am CEO of AppSpector. It's a remote debugging service of mobile apps. We collect a lot of data and this year I've been looking into different data formats that can improve our compression ratio. Before I've found tree-buf I've actually started prototyping columnar format. Something very similar to that you've build. Like Parquet on steroids :)
I wanted to ask you about directions that you envision for this library. Do you expect significant format changes in future?
If you need any help I can participate in development.
There's still quite a bit of work to do, but depending on your needs it might be sufficient already. There are some breaking changes coming to the format, but with a pretty simple compatibility layer (which I could just add to the library) you would be insulated from those changes.
Here's a list of major milestones that are upcoming and might affect the format:
Transient data MVP
This is just a bunch of minor issues I want to tackle before saying that yes, Tree-Buf is compelling for several use-cases for non-permanent data and you aren't likely to run into any silly issues. For example, right now there is no signed integer support. It's not hard to add, but probably something a lot of people would run into. The tracking issue is here: #25
Iterators and Chunking
The way Tree-Buf re-structures your data into a columnar-like format limits the applicability for huge files (anything bigger than would comfortably fit in RAM is a problem). If your data is the sort that you might find in CSV this is not hard to address, but for the more general case with nested Vecs which themselves may be very large, it becomes a more interesting problem to support both reading and writing efficiently. I think I know what I need to do though. Tracking issues are here: #17 #5 #4
SIMD Accelerated Compressors
Once the broader format is locked down, it'll be time to go over each data type again and select state of the art compressors for that type. Right now, just some basic libraries are used and there is no SIMD. So there's potential for a lot of performance and compression gains here. After this is done, I think the format will probably be sealed.
More Languages
With the format sealed, now we can add support for other languages besides Rust. I'm not sure what will be most important. but C++, Swift, Go, C#, and Java seem obvious candidates (probably in that order?). I'm thinking of skipping JavaScript and just compiling Rust to wasm for in-browser use. JavaScript is an objectively bad language for serializing/deserializing with high performance.
How you can help
The best way to help right now would probably be just to use Tree-Buf and keep an open line of communication going the details of your use-case and what could make Tree-Buf better for you. We did that for BOSS here: #6 and you can see a pretty drastic difference from the beginning to the end of that thread in how the format and user experience improved.
hey @That3Percent ! sorry for long replay, my son was born few weeks after our discussion so I was pretty busy. I am going to implement more complicated prototype using treebuf and came back to you with more detailed feedback.
Congrats! I've been pretty slammed in life too. Hopefully I should be able to dedicate some time to addressing your feedback as soon as it's ready.
I've been putting some thought into making the format and usage amenable to upgrades as well. There should be a 1.0 release before having all of the above features to make sure that Tree-Buf is not perpetually in 0.x. Just by way of example, SIMD accelerated compression should not hold off a 1.0 release. Instead, an argument could be supplied when writing for a version of the format to be compatible with which would not utilize gated features.
@That3Percent what's the maintenance status of this library?
@danieleades If I'm being realistic I'd sadly have to report that I've abandoned this library for the foreseeable future. My present role as Principal System Architect at Edge & Node working on The Graph consumes all of my mental energy.
I still think the world needs Tree-Buf, though. The idea is to show that efficiency and convenience are not at odds. Someday I would hope to even be able to pop open any Tree-Buf file in VSCode or similar and display/edit the contents as though it were a JSON file (Tree-Buf is self-describing). My thesis is that universal tooling support would end the age-old debate about human readability vs succinct data for some use-cases. Consider that UTF-8 is not human readable either, it's just widely supported.
But alas, I'm not the person to bring that into the world at this time. I think I've at least shown what's possible, and if I could find someone I trust who would want to take over I'd be glad to handover the keys. Or, if someone wants to fork or take inspiration for developing a new format I'd support that too.
@danieleades Just to make sure there is no lingering false advertising, I verified that the status of this library is still set to experimental
https://github.com/That3Percent/tree-buf/blob/master/tree-buf/Cargo.toml#L13