tc39/proposal-arraybuffer-base64

Language feature or web platform feature?

lucacasonato opened this issue ยท 5 comments

Hey folks ๐Ÿ‘‹

I created a proposal to add more general binary encoding API to the web platform a few weeks back at https://github.com/lucacasonato/proposal-binary-encoding. It is a proposal for addition into HTML or W3C, so it has access to more underlying infrastructure (for example WHATWG streams for streaming support). The API is aligned to the web standard TextEncoder / TextDecoder (but with reversed roles). It also supports hex encoding, and has a obvious path to add other encodings in the future (for example base64url, base62 or base32).

We should discuss if this makes more sense as a language, or a platform feature. For example how would a streaming variant of this proposal look, and how could it be extended to hex or base32 encoding?

Hi, thanks for opening this!

We should discuss if this makes more sense as a language, or a platform feature.

Personally I think it is a much better fit as a language feature: there is nothing web- host-specific about it, and it's a much nicer experience for it to exist on ArrayBuffer proper rather than in some other API. Plus the TextEncoder interfaces are, frankly, quite clunky, though there's nothing inherent about being a web-platform feature which makes them be so.

For example how would a streaming variant of this proposal look, and how could it be extended to hex or base32 encoding?

I'm not sure that streaming really makes sense for this API? This is specifically about translating between an ArrayBuffer and a Base64 string, rather than an arbitrary stream of binary data. It seems reasonable for the web platform to support both, if you have a use case that wants to operate on a stream of binary data.

As to other representations: the way I'd expand this proposal to include hex would be to add toHex and fromHex methods. Similarly for base32, though in my experience base32 does not seem common enough to warrant language (or platform) support.

Yeah, I agree that for the most simple use-case (a few 100 bytes of data), a direct ArrayBuffer to string conversion as a method on ArrayBuffer would likely be most useful. I am just worried that this would be too inflexible for use-cases with very large datasets. Those are exactly the cases where a streaming API would be useful:

Consider for example a use case where you want to upload a large file to a remote server in base64 encoding. Having to load the entire file into memory is very inefficient - you would rather progressively pipe that through a stream combinator that can process data in chunks.

As for other representations: wouldn't a more generic fromBinary / toBinary method with support for multiple encodings make more sense? That would also allow for support for base64url. The encoding would be passed as the first argument to the method.

Consider for example a use case where you want to upload a large file to a remote server in base64 encoding. Having to load the entire file into memory is very inefficient - you would rather progressively pipe that through a stream combinator that can process data in chunks.

Sure, and like I said it seems reasonable for the web platform to support a stream transformer as well. But for the use case I have personally encountered, what I want is to translate between an ArrayBuffer and a string of base64, and getting streams involved in that seems pretty ugly compared to the API in this proposal. I think trying to have a single API that solves both use cases would be to the detriment of both.

wouldn't a more generic fromBinary / toBinary method with support for multiple encodings make more sense?

I think that ends up being strictly worse than having a few different methods for various encodings. Having methods is clearer and is more discoverable, and the different methods have different options (e.g. for base64 you want to have options for padding and alphabet, for hex you have capitalization) that wouldn't really make sense to merge into one thing.

Why not both?

WebAssembly is a language API which includes optional support for streams by specifying an API which the platform can choose to implement:

The Web embedding includes additional methods useful in that context. In non-web embeddings, these APIs may not be present.

These are compileStreaming and instantiateStreaming, and browsers support it while Node.js doesn't.

So there is precedent for a language feature which has a platform extension, and I think this is the best of both worlds.

Closing as settled.

I would follow with interest a proposal for a base64 stream transformer on the web platform. This proposal exposes the primitives necessary to build such a thing in userland, but does not include any explicit streaming.