More useful error than `toString failed` when Buffer length greater than kMaxLength?
mhart opened this issue ยท 46 comments
I'm assuming there's some sort of memory constraint trying to convert buffers into strings?
$ uname -a
Darwin ReBuke-Pro.local 15.0.0 Darwin Kernel Version 15.0.0: Wed Aug 26 16:57:32 PDT 2015; root:xnu-3247.1.106~1/RELEASE_X86_64 x86_64
$ node --version
v4.1.1
$ node -p 'new Buffer(268435440).toString().length'
268435440
But:
$ node -p 'new Buffer(268435441).toString().length'
buffer.js:378
throw new Error('toString failed');
^
Error: toString failed
at Buffer.toString (buffer.js:378:11)
at [eval]:1:23
at Object.exports.runInThisContext (vm.js:54:17)
at Object.<anonymous> ([eval]-wrapper:6:22)
at Module._compile (module.js:434:26)
at node.js:566:27
at doNTCallback0 (node.js:407:9)
at process._tickCallback (node.js:336:13)
Firstly, I'm not sure that this should even be an error (are we really only limited to 256MB? That's... not very much). This seems to have been addressed in the past, but I feel like a regression might have occurred: #1374 (comment)
But that aside, can we actually detect what the error is and report on it in a more detailed way? The current toString
check provides little insight.
256 MB is the limit for strings in V8 so no, it's not a regression. What would you report in more detail?
Anything would be better than that TBH.
It's not obvious to the user what's actually gone wrong.
How about "Cannot create string larger than 256MB"? Seems a lot clearer to me.
I think a pull request to that effect would be acceptable.
But is that the only error that can occur? Looking at the code at the moment, the check is pretty basic โ is there no way of getting more detail as to why result
may be undefined
?
Not at the moment, no. The undefined value trickles down from StringBytes::Encode() in src/string_bytes.h. I don't think there are other code paths that return an empty Local<String>
but I didn't check exhaustively.
FWIW, v8 (I assume?) spits out a slightly better error when trying to concat strings:
$ node -p 'var a = "a"; for (var i = 0; i < 27; i++) a = a + a; a.length'
134217728
$ node -p 'var a = "a"; for (var i = 0; i < 28; i++) a = a + a; a.length'
[eval]:1
var a = "a"; for (var i = 0; i < 28; i++) a = a + a; a.length
^
RangeError: Invalid string length
at [eval]:1:49
at Object.exports.runInThisContext (vm.js:54:17)
at Object.<anonymous> ([eval]-wrapper:6:22)
at Module._compile (module.js:434:26)
at node.js:566:27
at doNTCallback0 (node.js:407:9)
at process._tickCallback (node.js:336:13)
Is the size limit exposed at all? If it were I'd be happy to add a PR that checks, in the case of an undefined
result, whether the Buffer length is greater than it.
I don't think it is exposed in JS. The exact limit is (1 << 28) - 16
(268435440) and is defined in
Line 2083 in 64beab0
I would love to take a stab at this if people are open to it.
/cc @jasnell
Works for me!
On Oct 6, 2015 12:49 PM, "Myles Borins" notifications@github.com wrote:
I would love to take a stab at this if people are open to it.
/cc @jasnell https://github.com/jasnell
โ
Reply to this email directly or view it on GitHub
#3175 (comment).
@MylesBorins Seems the discussion addresses more than just the error message. Mind identifying the specifics of what you'll be working on?
I specifically was interested in a better error message when toString is called on a Buffer larger than kStringMaxLength
That seemed like a pretty easy one to knock out (I've basically got it done already).
@MylesBorins I think the discussion was around how to detect the limit though โ have you exposed kMaxLength
?
I mean, the code comments in v8.h for NewFromUtf8
, NewFromOneByte
and NewFromTwoByte
do say "Only returns an empty value when length > kMaxLength" so perhaps as @bnoordhuis suggested it's the only reason for this particular code path... Are the comments enough to go on?
I take back what I said about having this done. I was hard coding the size, which in retrospect is not robust enough.
@mhart I don't suggest attempting to pre-detect the limit. Let v8 try and it will return whether it could or not. It's both more future proof and allows limitations on system resources to also return undefined.
@MylesBorins Improving the error message is a good place to start.
@trevnorris not sure what you mean? I wasn't suggesting to pre-detect the limit.
Although I don't exactly know what pre-detect means ๐ธ
I think the discussion was around how to detect the limit though
This made me believe that you intended on detecting the kMaxStringLength
from the string stored in the buffer. In the utf-8 case this isn't trivial. Also, it's an optimization for the uncommon path. Allow the string to be written immediately then handle it on failure. Don't attempt to error early.
Yes, I was talking about handling it on failure: #3175 (comment)
detecting the kMaxStringLength from the string
No no โ I was talking about kMaxLength
that's defined in v8.h โ so if the result comes back as undefined, then check if the reason it's undefined is because the buffer length is greater than kMaxLength
โ then we can give the user a much nicer error, including what the actual current limit is.
@mhart v8::String::kMaxLength
is process.binding('buffer').kStringMaxLength
. It's been exposed that way for testing. Should be able to use that for your error message.
EDIT: Sorry. I've been typing kMaxStringLength
instead of kStringMaxLength
.
For reference, here's where it's exposed: https://github.com/nodejs/node/blob/v4.1.2/src/node_buffer.cc#L979-L981
Oh cool! So I guess the only trickiness comes with multi-byte strings? I mean if Buffer.length > kMaxStringLength then it's not necessarily the case that string.length > kMaxStringLength โ but... we could just add that to the error message anyway โ in a way that doesn't necessarily imply it was the definitive cause...?
"toString failed โ maximum string length is 268435440" or similar?
@mhart Sorry. It's actually a bit deeper than that. If we want to do it properly then we'll have to detect if the return value from StringBytes::Encode()
, as used in src/node_buffer.cc
, is empty. If it is then we can throw a specific error message. Though that will require converting all of StringBytes::Encode()
to use the MaybeLocal<>
v8 API so we can catch if the return value is empty (the current API used will throw immediately).
@MylesBorins How are you approaching the problem?
@trevnorris Ah, I assumed "empty" in StringBytes::Encode
-> "undefined" in JS. So I assumed we could just do the check in JS after an undefined return value.
ie:
if (result === undefined) {
var kMaxStringLength = process.binding('buffer').kMaxStringLength;
if (this.length > kMaxStringLength)
throw new Error('toString failed โ maximum string length is ' + kMaxStringLength);
else
throw new Error('toString failed');
}
(that said, it does look like the Local<String>
versions have all been deprecated by the MaybeLocal<String>
versions โ so moving to MaybeLocal
is something that'll need to be done at some point anyway โ not sure if that's orthogonal to the discussion here though)
Hah. Totally missed how Buffer#toString()
checks result === undefined
.
With the kStringMaxLength
check you need to do the following:
if ((encoding === 'ucs2' || encoding === 'ucs-2' ||
encoding === 'utf16le' || encoding === 'utf-16le') &&
this.length > (kStringMaxLength / 2))
throw new Error('toString failed - buffer larger than maximum string size');
if ((encoding === 'utf8' || encoding === 'utf-8') &&
this.length > kStringMaxLength)
throw new Error('we don\'t really know this string size...');
if (this.length > kStringMaxLength)
throw new Error('assuming using a one byte encoding here');
EDIT: This is approximate and messy. Which is why simply checking if the C++ value is empty or not is the more sure way.
Right, exactly, something like that.
Now chances are in the utf-8 case, AFAICT, that the reason it's undefined is because of the length, regardless of whether we can detect it or not, which is why I was suggesting an error message that suggests it might be the string length but doesn't necessarily imply it ๐
(Kinda weird that v8 imposes a character limit instead of a byte limit, no? Assuming the limit is there to impose some sort of memory constraint anyway...)
So binding
is already exposed at the top of the file, and in fact kMaxLength
is already being exported.
> var buffer = require('buffer');
> console.log(buffer.kMaxLength);
2147483647
The result is roughly 8 times (which is what I would expect in converting bits to bytes), but the result is off by 14.875... which confuses me.
So we should be able to easily check the length if the result is undefined
if (result === undefined) {
var error = 'toString failed';
// naive check, needs to also check encoding
if (this.length > binding.kMaxLength / 4) {
error += ': Buffer larger than kStringMaxLength';
}
throw new Error(error);
}
That being said I'm not sure what to do about the rogue 14.875 bytes
@MylesBorins I think that's a different limit you're looking at. We're talking about https://github.com/nodejs/node/blob/v4.1.2/src/node_buffer.cc#L979-L981 which is https://github.com/nodejs/node/blob/v4.1.2/deps/v8/include/v8.h#L2083 which is defined as @targos pointed out is (1 << 28) - 16
oh my bad... so then we can just augment the above to
if (result === undefined) {
var error = 'toString failed';
// naive check, needs to also check encoding
if (this.length > binding.kStringMaxLength {
error += ': Buffer larger than kStringMaxLength';
}
throw new Error(error);
}
and I think we are back at something that works :D
Right, well that's what I proposed here: #3175 (comment) and @trevnorris pointed out the complexities around encodings here: #3175 (comment)
cool. Thanks for being patient with me!
Question is, is it possible that undefined
is returned in any other case than the string was too large? e.g. if the machine has run out of memory. If not then we can simply skip any limit checks.
I'd be more than happy to put together some tests to figure out if this is the case but not entirely sure where to start. I'm on IRC if someone wants to take some time to try and get me started in the right direction
Adding a help wanted
label. Feel free to remove if that's misguided...
For what it's worth, the error message here has been marginally improved to Error: "toString()" failed
(although I'm not sure why the quotation marks are used--that seems peculiar--but the addition of parentheses are a marginal improvement).
And, of course, it still doesn't indicate why it failed though.
I... don't understand why the limit is 256 MB :(
There's an issue open on v8 too, if you'd like to star it: https://bugs.chromium.org/p/v8/issues/detail?id=6148
Can we rename the issue from "String length limit is small-ish" to "String length limit is small-af"
Great news! v8 has upped the limit to ~1GB on 64-bit archs: https://chromium-review.googlesource.com/c/570047
Hopefully it comes with a nicer error message when it fails too (although that's possibly still a Node.js responsibility?)
After 63eb267 this now returns a message Cannot create a string larger than 0x3fffffe7 bytes
with the error code set to ERR_STRING_TOO_LARGE
.
I am running into this issue when I try to download .tar file via request and piping it to tar.x(). Does this issue imply that the maximum body size that I can download via request.get() is 1G for 64-bit machine?
nvm.. I just had to set "encoding: null" ... which prevents the body to be passed to toString()