nodejs/node

buffer.write(buffer, 'base64')

Closed this issue Β· 16 comments

What's the fastest way to convert a buffer containing base64-encoded data to a buffer containing binary data?

Something like this?

var base64Buffer = ...; // e.g. A buffer containing base64 provided by a mime decoder.
var binaryBuffer = new Buffer(base64Buffer.toString('ascii'), 'base64')

Is there a way to write the base64 buffer directly into another buffer, without creating any interim strings?

I thought of binaryBuffer.write(base64Buffer, 'base64') but write() only accepts strings.

@jorangreef There's always going to be an interim string, since you have to first "decode" ASCII by going from Buffer to string, then decode base64 by going from string to Buffer.

Your first snippet is fine, but I'd recommend using the new API: Buffer.from(base64Buffer.toString(), 'base64')

@seishun I think he means without creating a V8 String object. It's entirely possible, it's just not something that's currently available AFAIK.

This is partly available in the buffer.transcode API, but base64 encoding is not supported.

This is partly available in the buffer.transcode API, but base64 encoding is not supported.

Because it uses ICU, and base64 isn't a way to encode text as binary data, it's the other way around. IMO supporting it in buffer.transcode would be kind of confusing.

buffer.transcode() intentionally only deals with efficient conversion of char sets. It does not deal with transformations like hex or base64 and never should.

Your first snippet is fine, but I'd recommend using the new API: Buffer.from(base64Buffer.toString(), 'base64')

@seishun thanks, using the constructor was a typo from habit.

I think he means without creating a V8 String object. It's entirely possible, it's just not something that's currently available AFAIK.

@mscdex yes, that's what I was getting at, avoiding the String allocation. It should be possible, but not provided by the api currently.

buffer.transcode() intentionally only deals with efficient conversion of char sets. It does not deal with transformations like hex or base64 and never should.

@jasnell agreed, hence the suggestion for buffer.write(buffer, 'base64') or another alternative. I don't think transcode is the way to do it.

buffer.write(buffer, 'base64') would be pretty confusing too. You're writing a buffer into a buffer, yet there's an encoding argument.

to be honest, I'd like to see a revision to Hex and Base64 support in the APIs because things are fairly inconsistent as they are now (e.g. Buffer.from([...]).toString('utf8') does not actually convert the bytes to utf8 but Buffer.from[...]).toString('hex') does convert to hex).

What would be more correct would be something like:

Buffer.from([...]).encode('base64');
Buffer.from([...]).encode('hex');

Unfortunately toString('base64') is one of the most common uses of Buffer in the ecosystem so this is likely something that can never actually change.

I agree, it’s been bugging me since I started node that toString('utf8') decodes from UTF-8 and toString('hex') encodes as Hex. If somebody has a good idea for how to tackle that wrt our API I’d love to see it. :)

I agree it's confusing, but on the other hand, toString(enc) just says "convert to string using enc" which makes sense for both hex and UTF-8.

@addaleax ... A require('buffer').encode(buf, 'hex') and require('buffer').encode(buf, 'base64') could easily deal with the discrepancy and would pair nicely with require('buffer').transcode(). The return value would be another Buffer with the encoded bytes. Introducing a new API that makes more sense is not really the difficult issue, however. It's transitioning people off the existing buf.toString('hex') and buf.toString('base64') that's the real difficulty.

Buffer creation is equally problematic. Buffer.from(bytes, 'utf8') and Buffer.from(bytes, 'hex') are quite different from one another. The inconsistency is irritating and better APIs could certainly help but getting userland to actually use those improved APIs has been a huge difficulty.

@jasnell I'd be against adding new APIs unless we are also planning to phase out the old ones, and we know well that it's not as easy as we'd wish. And the existing API isn't exactly bad either.

Yep, which is why I haven't pursued it. The current API is not painful enough to justify the level of disruption changing it would cause.

Can this be closed as an answered question? All the ways currently exposed by Node use an temporary string, and that makes sense, because as @seishun explained this question is really asking about base64+ascii β€œdouble” encoding.

Indeed, closing as an answered question.

In case anyone has an issue with the lack of buffer-to-buffer Base64 encoding/decoding support in Node, I just finished https://github.com/ronomon/base64 as a stop-gap until these and other base64 issues are addressed. The encoder/decoder accepts and returns buffers without allocating any interim V8 Strings. The throughput gain from avoiding unnecessary string allocation is substantial over 4 MB+ buffers.