nodejs/node

More useful error than `toString failed` when Buffer length greater than kMaxLength?

mhart opened this issue ยท 46 comments

mhart commented

I'm assuming there's some sort of memory constraint trying to convert buffers into strings?

$ uname -a
Darwin ReBuke-Pro.local 15.0.0 Darwin Kernel Version 15.0.0: Wed Aug 26 16:57:32 PDT 2015; root:xnu-3247.1.106~1/RELEASE_X86_64 x86_64
$ node --version
v4.1.1
$ node -p 'new Buffer(268435440).toString().length'
268435440

But:

$ node -p 'new Buffer(268435441).toString().length'
buffer.js:378
    throw new Error('toString failed');
    ^

Error: toString failed
    at Buffer.toString (buffer.js:378:11)
    at [eval]:1:23
    at Object.exports.runInThisContext (vm.js:54:17)
    at Object.<anonymous> ([eval]-wrapper:6:22)
    at Module._compile (module.js:434:26)
    at node.js:566:27
    at doNTCallback0 (node.js:407:9)
    at process._tickCallback (node.js:336:13)

Firstly, I'm not sure that this should even be an error (are we really only limited to 256MB? That's... not very much). This seems to have been addressed in the past, but I feel like a regression might have occurred: #1374 (comment)

But that aside, can we actually detect what the error is and report on it in a more detailed way? The current toString check provides little insight.

256 MB is the limit for strings in V8 so no, it's not a regression. What would you report in more detail?

mhart commented

Anything would be better than that TBH.

mhart commented

It's not obvious to the user what's actually gone wrong.

How about "Cannot create string larger than 256MB"? Seems a lot clearer to me.

I think a pull request to that effect would be acceptable.

mhart commented

But is that the only error that can occur? Looking at the code at the moment, the check is pretty basic โ€“ is there no way of getting more detail as to why result may be undefined?

Not at the moment, no. The undefined value trickles down from StringBytes::Encode() in src/string_bytes.h. I don't think there are other code paths that return an empty Local<String> but I didn't check exhaustively.

mhart commented

FWIW, v8 (I assume?) spits out a slightly better error when trying to concat strings:

$ node -p 'var a = "a"; for (var i = 0; i < 27; i++) a = a + a; a.length'
134217728
$ node -p 'var a = "a"; for (var i = 0; i < 28; i++) a = a + a; a.length'
[eval]:1
var a = "a"; for (var i = 0; i < 28; i++) a = a + a; a.length
                                                ^

RangeError: Invalid string length
    at [eval]:1:49
    at Object.exports.runInThisContext (vm.js:54:17)
    at Object.<anonymous> ([eval]-wrapper:6:22)
    at Module._compile (module.js:434:26)
    at node.js:566:27
    at doNTCallback0 (node.js:407:9)
    at process._tickCallback (node.js:336:13)

Is the size limit exposed at all? If it were I'd be happy to add a PR that checks, in the case of an undefined result, whether the Buffer length is greater than it.

I don't think it is exposed in JS. The exact limit is (1 << 28) - 16 (268435440) and is defined in

static const int kMaxLength = (1 << 28) - 16;

I would love to take a stab at this if people are open to it.

/cc @jasnell

Works for me!
On Oct 6, 2015 12:49 PM, "Myles Borins" notifications@github.com wrote:

I would love to take a stab at this if people are open to it.

/cc @jasnell https://github.com/jasnell

โ€”
Reply to this email directly or view it on GitHub
#3175 (comment).

@MylesBorins Seems the discussion addresses more than just the error message. Mind identifying the specifics of what you'll be working on?

I specifically was interested in a better error message when toString is called on a Buffer larger than kStringMaxLength

That seemed like a pretty easy one to knock out (I've basically got it done already).

mhart commented

@MylesBorins I think the discussion was around how to detect the limit though โ€“ have you exposed kMaxLength?

mhart commented

I mean, the code comments in v8.h for NewFromUtf8, NewFromOneByte and NewFromTwoByte do say "Only returns an empty value when length > kMaxLength" so perhaps as @bnoordhuis suggested it's the only reason for this particular code path... Are the comments enough to go on?

I take back what I said about having this done. I was hard coding the size, which in retrospect is not robust enough.

@mhart I don't suggest attempting to pre-detect the limit. Let v8 try and it will return whether it could or not. It's both more future proof and allows limitations on system resources to also return undefined.

@MylesBorins Improving the error message is a good place to start.

mhart commented

@trevnorris not sure what you mean? I wasn't suggesting to pre-detect the limit.

mhart commented

Although I don't exactly know what pre-detect means ๐Ÿ˜ธ

@mhart

I think the discussion was around how to detect the limit though

This made me believe that you intended on detecting the kMaxStringLength from the string stored in the buffer. In the utf-8 case this isn't trivial. Also, it's an optimization for the uncommon path. Allow the string to be written immediately then handle it on failure. Don't attempt to error early.

mhart commented

Yes, I was talking about handling it on failure: #3175 (comment)

mhart commented

detecting the kMaxStringLength from the string

No no โ€“ I was talking about kMaxLength that's defined in v8.h โ€“ so if the result comes back as undefined, then check if the reason it's undefined is because the buffer length is greater than kMaxLength โ€“ then we can give the user a much nicer error, including what the actual current limit is.

@mhart v8::String::kMaxLength is process.binding('buffer').kStringMaxLength. It's been exposed that way for testing. Should be able to use that for your error message.

EDIT: Sorry. I've been typing kMaxStringLength instead of kStringMaxLength.

For reference, here's where it's exposed: https://github.com/nodejs/node/blob/v4.1.2/src/node_buffer.cc#L979-L981

mhart commented

Oh cool! So I guess the only trickiness comes with multi-byte strings? I mean if Buffer.length > kMaxStringLength then it's not necessarily the case that string.length > kMaxStringLength โ€“ but... we could just add that to the error message anyway โ€“ in a way that doesn't necessarily imply it was the definitive cause...?

"toString failed โ€“ maximum string length is 268435440" or similar?

@mhart Sorry. It's actually a bit deeper than that. If we want to do it properly then we'll have to detect if the return value from StringBytes::Encode(), as used in src/node_buffer.cc, is empty. If it is then we can throw a specific error message. Though that will require converting all of StringBytes::Encode() to use the MaybeLocal<> v8 API so we can catch if the return value is empty (the current API used will throw immediately).

@MylesBorins How are you approaching the problem?

mhart commented

@trevnorris Ah, I assumed "empty" in StringBytes::Encode -> "undefined" in JS. So I assumed we could just do the check in JS after an undefined return value.

mhart commented

ie:

if (result === undefined) {
  var kMaxStringLength = process.binding('buffer').kMaxStringLength;
  if (this.length > kMaxStringLength)
    throw new Error('toString failed โ€“ maximum string length is ' + kMaxStringLength);
  else
    throw new Error('toString failed');
}
mhart commented

(that said, it does look like the Local<String> versions have all been deprecated by the MaybeLocal<String> versions โ€“ so moving to MaybeLocal is something that'll need to be done at some point anyway โ€“ not sure if that's orthogonal to the discussion here though)

Hah. Totally missed how Buffer#toString() checks result === undefined.

With the kStringMaxLength check you need to do the following:

if ((encoding === 'ucs2' || encoding === 'ucs-2' ||
    encoding === 'utf16le' || encoding === 'utf-16le') &&
    this.length > (kStringMaxLength / 2))
  throw new Error('toString failed - buffer larger than maximum string size');

if ((encoding === 'utf8' || encoding === 'utf-8') &&
    this.length > kStringMaxLength)
  throw new Error('we don\'t really know this string size...');

if (this.length > kStringMaxLength)
  throw new Error('assuming using a one byte encoding here');

EDIT: This is approximate and messy. Which is why simply checking if the C++ value is empty or not is the more sure way.

mhart commented

Right, exactly, something like that.

Now chances are in the utf-8 case, AFAICT, that the reason it's undefined is because of the length, regardless of whether we can detect it or not, which is why I was suggesting an error message that suggests it might be the string length but doesn't necessarily imply it ๐Ÿ˜ˆ

mhart commented

(Kinda weird that v8 imposes a character limit instead of a byte limit, no? Assuming the limit is there to impose some sort of memory constraint anyway...)

So binding is already exposed at the top of the file, and in fact kMaxLength is already being exported.

> var buffer = require('buffer');
> console.log(buffer.kMaxLength);
2147483647

The result is roughly 8 times (which is what I would expect in converting bits to bytes), but the result is off by 14.875... which confuses me.

So we should be able to easily check the length if the result is undefined

  if (result === undefined) {
    var error = 'toString failed';
    // naive check, needs to also check encoding
    if (this.length > binding.kMaxLength / 4) {
      error += ': Buffer larger than kStringMaxLength';
    }
    throw new Error(error);
  }

That being said I'm not sure what to do about the rogue 14.875 bytes

mhart commented

@MylesBorins I think that's a different limit you're looking at. We're talking about https://github.com/nodejs/node/blob/v4.1.2/src/node_buffer.cc#L979-L981 which is https://github.com/nodejs/node/blob/v4.1.2/deps/v8/include/v8.h#L2083 which is defined as @targos pointed out is (1 << 28) - 16

oh my bad... so then we can just augment the above to

  if (result === undefined) {
    var error = 'toString failed';
    // naive check, needs to also check encoding
    if (this.length > binding.kStringMaxLength {
      error += ': Buffer larger than kStringMaxLength';
    }
    throw new Error(error);
  }

and I think we are back at something that works :D

mhart commented

Right, well that's what I proposed here: #3175 (comment) and @trevnorris pointed out the complexities around encodings here: #3175 (comment)

cool. Thanks for being patient with me!

Question is, is it possible that undefined is returned in any other case than the string was too large? e.g. if the machine has run out of memory. If not then we can simply skip any limit checks.

I'd be more than happy to put together some tests to figure out if this is the case but not entirely sure where to start. I'm on IRC if someone wants to take some time to try and get me started in the right direction

Trott commented

Adding a help wanted label. Feel free to remove if that's misguided...

Trott commented

For what it's worth, the error message here has been marginally improved to Error: "toString()" failed (although I'm not sure why the quotation marks are used--that seems peculiar--but the addition of parentheses are a marginal improvement).

And, of course, it still doesn't indicate why it failed though.

I... don't understand why the limit is 256 MB :(

mhart commented

There's an issue open on v8 too, if you'd like to star it: https://bugs.chromium.org/p/v8/issues/detail?id=6148

Can we rename the issue from "String length limit is small-ish" to "String length limit is small-af"

mhart commented

Great news! v8 has upped the limit to ~1GB on 64-bit archs: https://chromium-review.googlesource.com/c/570047

Hopefully it comes with a nicer error message when it fails too (although that's possibly still a Node.js responsibility?)

After 63eb267 this now returns a message Cannot create a string larger than 0x3fffffe7 bytes with the error code set to ERR_STRING_TOO_LARGE.

I am running into this issue when I try to download .tar file via request and piping it to tar.x(). Does this issue imply that the maximum body size that I can download via request.get() is 1G for 64-bit machine?

nvm.. I just had to set "encoding: null" ... which prevents the body to be passed to toString()