stdout/stderr buffering considerations

Question

stdout/stderr buffering considerations

ChALkeR opened this issue 10 years ago · 62 comments

I tried to discuss this some time ago at IRC, but postponed it for quite a long time. Also I started the discussion of this in #1741, but I would like to extract the more specific discussion to a separate issue.

I could miss some details, but will try to give a quick overview here.

Several issues here:

Many calls to console.log (e.g. calling it in a loop) could chew up all the memory and die — #1741, #2970, #3171, #18013.
console.log has different behavior while printing to a terminal and being redirected to a file. — #1741 (comment).
Output is sometimes truncated — #6297, there were other ones as far as I remember.
The behaviour seems to differ across platforms.

As I understand it — the output has an implicit write buffer (as it's non-blocking) of unlimited size.

One approach to fixing this would be to:

Introduce an explicit cyclic write buffer.
Make writes to that cyclic buffer blocking.
Make writes from the buffer to the actual output non blocking.
When the cyclic buffer reaches it's maximum size (e.g. 10 MiB) — block further writes to the buffer until a corresponding part of it is freed.
On (normal) exit, make sure the buffer is flushed.

For almost all cases, except for the ones that are currently broken, this would behave as a non-blocking buffer (because writes to the buffer are considerably faster than writes from the buffer to file/terminal).

For cases when the data is being piped to the output too quickly and when the output file/terminal does not manage to output it at the same rate — the write would turn into a blocking operation. It would also be blocking at the exit until all the data is written.

Another approach would be to monitor (and limit) the size of data that is contained in the implicit buffer coming from the async queue, and make the operations block when that limit is reached.

eljefedelrodeodeljefe commented 10 years ago

alright

Answer 1 · 2016-04-25T21:07:56.000Z

process.stdout._handle.setBlocking(true); as proposed in #1741 (comment) and #6297 (comment) would considerably hurt the performance. Exposing that isn't a good solution to any of this.

Answer 2 · 2016-04-25T22:25:08.000Z

process.stdout._handle.setBlocking(true);
would considerably hurt the performance

Developers are a smart bunch. With the proposed setBlocking API they can decide for themselves if they want stdout/stderr to block on writes. The default could still remain non-blocking. There's no harm in giving people the option. Having CLI applications block on writes for stdout/stderr is quite desirable. This is not unlike all the Sync calls in the fs module. In a high-performance server scenario you don't have to use them, but they are there if you need them.

Answer 3 · 2016-04-26T06:50:42.000Z

There's no harm in giving people the option.

There is, every option comes at a cost from both sides. That option would need to be supported, and it would actually just toggle the behaviour between two states that are far from being ideal in many cases.

As for the option here, the said cyclic buffer maximum size could perhaps be configured using a command-line argument, at the buffer size 0 turning in into a fully blocking API.

Answer 4 · 2016-04-26T12:50:56.000Z

toggle the behaviour between two states that are far from being ideal in many cases

The proposed API process.stdout._handle.setBlocking(Boolean) could be documented to only be guaranteed to work before any output is sent to the stream.

On UNIX it is a trivial system call on a fd. It's pretty much the same thing on Windows as far as I recall.

Edit: See #6297 (comment)

Answer 5 · 2016-04-26T13:00:32.000Z

The proposed API process.stdout._handle.setBlocking(Boolean) could be documented to only be guaranteed to work before any output is sent to the stream.

We should not be documenting underscore-prefixed properties. This would need to be a separate, true public API.

Answer 6 · 2016-04-27T06:37:03.000Z

In CLI tools, I don't want to have stdout/stderr be blocking though, rather I want to be able to explicitly exit the process and have stdout/stderr flushed before exiting. Has there been any discussion about introducing a way to explicitly exit the process gracefully? Meaning flushing the stdout/stderr and then exit. Maybe something like a process.gracefulExit() method? In ES2015 modules, we can no longer return in the top-scope, so short-circuiting will then be effectively impossible, without a nesting mess.

Answer 7 · 2016-04-27T07:09:34.000Z

The exit module is a drop in replacement for process.exit() that effectively waits for stdout/stderr to be drained. Other streams can optionally be drained as well. Something like it really ought to rolled into node core with a different name.

Edit: upon further analysis, the exit module is no longer recommended.

Answer 8 · 2016-04-27T07:18:51.000Z

I think there's a slight discrepancy in interpretation here.

Are we talking about console blocking, or console buffering? Albeit similar, these are two different considerations.

Further, I'm not entirely sure what the problem is you're trying to fix, @ChALkeR. Could you elaborate on a problem that is currently being faced? I'm not groking what it is that's broken here.

One thing to note is that on most systems (usually UNIX systems) stdout is buffered to begin with, whereas stderr is usually not (or sometimes just line buffered).

However, an explicit fflush(stdout) or fflush(stderr) is not necessary upon a process closing because these buffers are handled by the system and are flushed when main() returns. If the buffer grows too large during execution, the system automatically flushes it.

This is why the following program still outputs while the loop is running, and on most terminal emulators the same reason why it looks like it's 'jittery'.

#include <stdio.h>

int main(void) {
    for (int i = 10000000; i > 0; i--) {
        printf("This is a relatively not-short line of text. :) %d\n", i);
    }
    return 0;
}

Blocking, on the other hand, doesn't make much sense. What do you want to block? Are you suggesting we fflush(stdout) every time we console.log()? Because if so, that is a terrible idea.

Developers are a smart bunch. With the proposed setBlocking API they can decide for themselves if they want stdout/stderr to block on writes.

I respectfully disagree. There is absolutely no point in blocking on stdout/stderr. It's very rarely done in the native world and when it is it's usually for concurrency problems (two FILE* handles being written to by two threads that cannot otherwise be locked; i.e. TTY descriptors). Let the system handle it.

The above program without fflush(stdout) on every iteration produced the following average:

real 6.32s
sys  0.54
405 involuntary context switches

and with fflush(stdout) on each iteration:

real 7.48s
sys  0.56s
1607  involuntary context switches

Thats 18.3% of a time increase on this microbenchmark alone. I'm in the 'time is relative and isn't a great indicator of actual performance' camp, so I included the involuntary context switch count (voluntary context switches would be the same between the two runs, obviously).

The results show that fflush(stdout) upon every message encouraged the system to switch out of the process 1202 more times - that's 296% higher than ye olde system buffering.

Further, the feature that was requested by @sindresorhus is part of how executables are formed - at least on UNIX systems, all file descriptors that are buffered are flushed prior to/during close()-ing or fclose()-ing, flushing prior to a process exiting is a needless step and exposing a new function for it would be completely unnecessary.

I'm still lost as to what is trying to be fixed here... I wouldn't mind seeing a process.stdout.flush() method, but it'd be abused in upstream dependencies and I'd find myself writing some pretty annoyed issues in that case. I've never run into a problem with Node's use of standard I/O...

Also, just to reiterate - please don't even start with a .setBlocking() method. That's asking for trouble. The second a dependency makes standard I/O blocking then everything else is affected and then I have to answer to my technical lead about why my simple little program is switching out of context a bazillion times when it didn't before.

Answer 9 · 2016-04-27T07:51:49.000Z

@Qix-

Further, I'm not entirely sure what the problem is you're trying to fix, @ChALkeR. Could you elaborate on a problem that is currently being faced? I'm not groking what it is that's broken here.

See the issues I linked above.

Blocking, on the other hand, doesn't make much sense. What do you want to block? Are you suggesting we fflush(stdout) every time we console.log()? Because if so, that is a terrible idea.

No, of course I don't. console.log is not the same as printf in your example.
It's an async operation. I don't propose fflush, I propose to somehow make sure that we don't fill up the async queue with console.logs.

Answer 10 · 2016-04-27T07:59:06.000Z

@ChALkeR are we actually seeing this happen?

var str = new Array(8000).join('.');

while (true) {
    console.log(str);
}

works just fine. In what realistic scenario are we seeing the async queue fill up? The three linked issues are toy programs that are meant to break Node, and would break in most languages, in just about all situations, on all systems (resources are finite, obviously).

The way I personally see it, there are tradeoffs to having coroutine-like execution paradigms and having slightly higher memory usage is a perfectly acceptable tradeoff for the asynchronicity Node provides. It doesn't handle extreme cases because those cases are well outside any normal or even above normal scenario.

The first case, which mentions a break in execution to flush the async buffer, is perfectly reasonable and expected.

Side note: thanks for clarifying :) that makes more sense. I was hoping it wasn't seriously proposed we fflush() all the time. Just had to make sure to kill that idea before anyone ran with it.

Answer 11 · 2016-04-27T08:27:29.000Z

@Qix-

What's your platform?
Are you sure that you are not redirecting the output anywhere?

Answer 12 · 2016-04-27T08:33:14.000Z

@ChALkeR

OS X and linux (tested on both)
Positive.

Answer 13 · 2016-04-27T08:48:33.000Z

@Qix- Are you saying that memory usage eventually stabilizes when you run this? How much memory does it consume for you?

Answer 14 · 2016-04-27T08:57:15.000Z

@ChALkeR ah no, memory usage goes off the charts. Again, though, is this happening in practical environments outside of stress tests?

Answer 15 · 2016-04-27T08:58:41.000Z

@Qix- Yes, it does. Users have reported it several times already, see links above.

Answer 16 · 2016-04-27T10:07:28.000Z

@ChALkeR The three links you mentioned (#1741, #2970, #3171) are not practical environments outside of stress tests.

Truncated output (#6297 et al) is unrelated to the ones mentioned above.

Answer 17 · 2016-04-27T10:16:40.000Z

@Qix- #3171, for example, doesn't look like just a stress test to me.

Answer 18 · 2016-04-27T10:24:56.000Z

@ChALkeR

for (var i = 0; i < 1000000000; i++) {
    value = nextPrime(value);
    console.log(value);
}

This is definitely a stress test. This isn't a practical or realistic use of Node.js - an async scripting wrapper for IO-heavy applications.

For instance,

function loggit() {
    console.log('hello');
    setImmediate(loggit);
}

loggit()

tops out at 15M memory consumption. I've tweaked #3171's code to reflect the changes here, which now hovers around 45M (instead of inflating to over a gig in about a minute).

Of course a stupidly huge while/for loop with a ton of non-blocking IO (not to mention, a function that is solving an NP hard problem) is going to cause problems. Don't do it. It's a simple misunderstanding of how Node works, and if it were up to me I'd brush it off as an error on the developer's part.

The second we start baby-sitting bad design like all three of the examples you mentioned and start ad-hoc adding methods like .setBlocking(true) is when we start stepping on the toes of dependents down-stream, which begins to break everything.

Answer 19 · 2016-04-27T12:25:37.000Z

Are we talking about console blocking, or console buffering? Albeit similar, these are two different considerations.

Either blocking tty writes or graceful drain of stdout/stderr upon process.exit() would fix the problem at hand with truncated tty output.

I think both APIs should be part of node core. Devs should have the option to have stdout/stderr block if they so should choose or have stdout/stdout gracefully flushed if they find that more appropriate.

Logging to the console 10000000 times in a tight loop is not real life behavior. In reality performance oriented servers rarely log to the console - and when they do it's important that the information is output. But if you did want to log to the console 1e7 times (and run out of memory due to another known node issue) then you would not enable blocking on stdout - so it's a red herring.

Answer 20 · 2016-04-27T12:30:44.000Z

stdout/stderr should flush on program exit anyway. It's the fact node uses the exit() call that messes quite a few things up. That's actually an easy enough fix with Node specifically.

My problem with .setBlocking() is that it affects the process globally, not just specific pieces of code or files. That means a dependency could mess up how my code functions without really understanding what .setBlocking() does.

Answer 21 · 2016-04-27T12:41:28.000Z

Comment just posted in other thread: #6297 (comment)

process.stdout.flush() and process.stderr.flush()

It's not like C where the data is buffered in libc and can be flushed. The unflushed data is in some Javascript data structure in V8 and the event loop has been halted.

Answer 22 · 2016-04-27T22:43:51.000Z

@kzc yes, that is also a problem. exit() in and of itself is the cause of your specific problem, as has been mentioned quite a few times.

Answer 23 · 2016-04-29T07:46:52.000Z

@ChALkeR I dragged over the contents from the issue to #6456 and would be closing this, since I think it's highly related. Please feel free to reopen if you see the need. Thx.

Answer 24 · 2016-04-29T09:11:12.000Z

@eljefedelrodeodeljefe #6456 is only about process.exit(), and it doesn't cover the issues and the proposal outlined here. I don't see a reason for burying this until we have a proper discussion.

Answer 25 · 2016-05-02T18:49:35.000Z

The three linked issues are toy programs that are meant to break Node, and would break in most languages, in just about all situations, on all systems (resources are finite, obviously).

I wanted to evaluate this statement. So I build programs like the one in 2970 for C#, Java, Python, and PHP (on Windows). None of these languages exhibited the memory consumption behavior that Node.js does.

Answer 26 · 2016-05-02T19:22:49.000Z

@ben-page ok? Those languages are synchronous and blocking by default. Apples to oranges comparison - they have completely different use cases.

Answer 27 · 2016-05-03T00:07:07.000Z

@Qix- I agree. That's my point, mostly. One can't really argue these examples are "toy programs that are meant to break Node" and they "would break in most languages, in just about all situations, on all systems". I imagine there is very little overlap in those two set. In fact, these examples would break in very few languages, because most languages block when writing the console (as you said).

P.S. Erlang does not have this problem.

Answer 28 · 2016-05-03T00:13:11.000Z

Moreover, I don't think it's fair to dismiss these issues as "toy programs". An MCVE is inherently a toy program. But there's usually a real world reason that led to the issue being reported.

Additionally, I think it's worth considering that it's not necessarily obvious that console.log() asynchronous. Very few languages operate that way. And it's unlike other asynchronous functions in Node.js, that supports callbacks or events. This behavior can mimic a memory leak and has clearly caused multiple people trouble.

Answer 29 · 2016-05-03T00:19:15.000Z

I think it's worth considering that it's not necessarily obvious that console.log() asynchronous

Node.js is an async I/O environment. If you're using node and not understanding how it works at least at that very basic level, you're doing yourself a disservice. ;) Asynchronycity can manifest itself in many ways. Callbacks aren't even necessarily 'async' by nature (they can be synchronous).

Answer 30 · 2016-05-03T00:41:00.000Z

I agree with you completely, but that's not really a response to my point.

I did not say that people should not have to understand the language or that callbacks are necessarily asynchronous. I said that console.log is different from other asynchronous functions in that it is does not support any method of flow control or error handling (such as callbacks).

Answer 31 · 2016-05-03T00:57:01.000Z

@ben-page not really Node's fault, though. console.log exists in Node because it exists in browsers (I believe console is a basic global within V8 itself but I could be wrong).

Answer 32 · 2016-05-03T00:58:54.000Z

@Qix- Node.js has its own console.log, and that uses node’s own process.stdout under the hood.

Answer 33 · 2016-05-03T01:02:41.000Z

@addaleax Ah okay. That makes sense since that's pretty implementation specific anyway.

Answer 34 · 2016-07-24T17:49:05.000Z

This causes problems when you want the process to compute something and return a result then exit. If that result is large data (e.g. encoded binary data, or a large json string) then it's currently difficult to do. Here's a simple example:

var largeData = "SomeData".repeat(1024*1024) + ' ..... END!';


console.log(largeData); //using process.stdout.write here makes no difference either

process.exit();

You'll see that ...END! never gets printed to the console (At least on Node 6.3 on linux/bash). The data gets truncated.

All that's really needed is a simple callback for when the stdout command has finished:

process.stdout.write(largeData, function() { process.exit(); });

Answer 35 · 2016-08-09T09:51:03.000Z

I've just wasted time debugging an issue when the bug was from some other part of the program, due to STDERR getting truncated on crash and sending me the wrong way... it would be nice if the error stream behaved the same way as it has in virtually every other language I've ever use (many of which will buffer-by-default for other streams). Even for a server environment, which is more important? Allowing many millions of crashes to be generated per second, or being able to log their details completely?

Answer 36 · 2017-05-30T04:48:01.000Z

@ChALkeR ... does this need to remain open?

Answer 37 · 2018-03-07T07:44:17.000Z

I did some experiments on the performance of process.stdio, and here's my finding (along with some issues of Node.js process.stdio). I believe it's related so I'm posting it here...

I was originally inspired by this post How is GNU's yes so fast, and I wanted to write it in Node.js.

Take No.1

while (true) {
  console.log('y');
}

It came out with this:

$ node yes-1.js | pv -i 10 > /dev/null
^C03MiB 0:00:10 [ 822KiB/s] [ <=>

Comparing to gnu yes performance on my computer:

$ yes | pv -i 10 > /dev/null
^C.9GiB 0:00:10 [7.79GiB/s] [ <=>

Note that using process.stdout.write() is no better since IO is much much slower than a extra bunch of instructions.

How could we catch up with gnu yes? Buffering is the answer...

Take No. 2

var SIZE = 8192; // Same buf size with gnu-yes
var buf = Buffer.alloc(SIZE, 'y\n');
while (true) {
  process.stdout.write(buf);
}

Give it a shot:

$ node yes-2.js | pv > /dev/null
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory          ]
 118MiB 0:00:08 [14.3MiB/s] [   <=>

Wait... what happened to the heap? It's actually because of the slow IO... so we have to wait for IO to finish when there is too much of it ongoing, it's like close the tap when the bathtub is full, or you'll get "overflow".

Take No. 3

var SIZE = 8192;
var buf = Buffer.alloc(SIZE, 'y\n');

function yes3() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', yes3);
      break;
    }
  }
}

yes3();

And the result is:

$ node yes-3.js | pv -i 10 > /dev/null
^C.9GiB 0:00:10 [2.65GiB/s] [ <=>

2.65GiB/s is tons better than 822KiB/s, but there is still room to improvement considering the 7.79GiB/s that gnu yes does. I figured out that on my machine every call to a native C++ binding function will take ~ 3 microsecond no matter how less of code the C++ function does. So let's do the math:

1 / 0.000003 * 8192 / 1024 / 1024 / 1024 = 2.54 GiB/s

The above is the top speed I can get with buffer size === 8192, so let me make the buffer bigger ...

Take No. 4

var SIZE = 65536;
var buf = Buffer.alloc(SIZE, 'y\n');

function yes4() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', yes4);
      break;
    }
  }
}

yes4();

$ node yes-4.js | pv -i 10 > /dev/null
^C.6GiB 0:00:10 [6.16GiB/s] [ <=>

Buffer size 65536 is the best I've got using this approach. Bigger buffer will slow it down (is this because of some memory operation happens in Node.js core?)

The last (highest) bar is gnu yes.

So what can we conclude so far? I'm afraid there is nothing else than the same old rule of IO: use buffer and make it faster in several order of magnitude...

And then consider a few things that are Node.js specific:

Unlike process.stdout.write(), console.log() doesn't return anything, so there is no way that user can know that they should wait. This could happen when (1) there is too much logging (2) when there is too much IO in an application (3) when the IO is slow. Real scenarios can be one or any combination of the above three. e.g.

A Node.js test suites often crash due the heavy use of console.log() when the CI server is slow;
A stress test in a simulated environment with logging turned on often consumes extra large amount of memory or even crashes. Everything is OK when logging is turned off.
Nobody notices the false return value of process.stdio.write() and waits for the drain event; some of them are using console.log() and won't be able to do so.
I can go on and on but you know what I mean.

Even if there is no memory issue or crash, like what I've listed above, the slow IO will still cause issues, e.g.

The application won't be fast enough (IO throughput) for users who don't manage a buffer themselves.
No current logging frameworks is buffer-ed; they're all slow. Because there are many calls to C++ binding functions and each takes at least ~ 3 microsecond plus the cost of JS/C++ context switching, thinking of the accumulation of thousands of them. Note that pure JavaScript string concat in a JavaScript function only costs 0.05 microsecond (and there is no JS/C++ context switching), this is the cost of "writing a single log line" when buffering is in use.
Current logging frameworks are with a very low speed capacity: if average log line length is 80 chars, then the top speed is 1 / 0.000003 * 80 / 1024 / 1024 = 25 MiB/s. And usually an application is far behind this top speed. This could force users to use less logging in both development and production environment. Even if someone writes a super-fast logging framework, no applications could benefit from it unless all npm module owners are willing to depend on the new one. So I think this lead to the question: can Node.js core do something about it? e.g. providing a new set of API for buffer-ed IO? And/or slightly modify the behavior of current console.log()?

I'd like to discuss more and I'd love to contribute PRs. I'm doing more experiments right now, trying to get a solid solution with obvious advantage. And I think I should raise the question here for discussion.

Answer 38 · 2018-03-28T01:36:41.000Z

I've done a prototype on process.stdout.write() and it generates 20x performance gain. Here is the details.

(20x is not a big deal in when applying proper buffer mechanism to IO, we can expect more, see analysis below)

The principle is to use a buffer, say: 64 KiB (this is a good value in my previous experiment), to avoid a lot of actual writes.
Reduce the call from JS to C++, which will cost a couple of microseconds (10e-6 sec).
Avoid the use of Node Buffer, which will call into C++; use JavaScript string as the buffer mechanism. Concat strings into a bigger one before writing (Let's call this bigger one a "string buffer")
The modified process.stdout.write() that runs in the happy path (only concat string to current "string buffer") costs <100ns (10e-9 sec). Comparing to the original microsecond (10e-6 sec) version, this will increase the performance, 10+ times faster. The time cost of the actual write (10e-6 sec) will be amortized to every previous calls, it's still fast because the string buffer is pretty big (e.g. a typical 100 chars log line vs. 64 KiB string buffer -- amortized to 600+ calls).
But there is a drawback, a bottleneck was found when writing strings to process.stdout.write(), here is some details on this bottleneck (should I file a new issue for it?):
- process.stdout.write() always converts JavaScript Strings into Node Buffer, therefore it will slowdown the IO throughput to 1/5 or less, typical data: when writing Node buffer, the overall throughput is 8.x GiB/s; when writing strings, the overall throughput is 1.6x GiB/s.
- I've found that there is a series of C++ binding function writeUtf8String, writeAsciiString, ... and I believe if we write some if-else to detect (default) encoding argument in writeOrBuffer() (in _stream_writable.js) instead of always convert strings into Node Buffer, we can eliminate this bottleneck in most of the cases.
- If we can address this bottleneck, the performance of this prototype can be 5 times faster. Adding these 2 together, 100x performance gain is possible.
Here is the pseudo code of the prototype (the real one is harder to read), simple but powerful.

let buf = '';
function cacheWriteString(str) {
  if (typeof str === 'string') {
    buf += str;

    if (buf.length >= 65536) {
      process.stdout.write(buf);
      buf = '';
    }
  }

  // Optional: flush buf in a certain interval
  //           e.g. setInterval(()=> {...}, 33);
}

Maybe changing the behavior of current process.stdout.write() is not a good idea since it will change the behavior of existing apps... if this is the case, a new API can be introduced and encourage developers to use that new one with much better performance.

I think the meaning of this approach is to increase the performance of Node.js apps that uses large amount of data on process.stdout, e.g. logging, text processing (sed/grep alike), piping, ...

When the performance is greatly increased, large amount of log lines can be on the table, which was not possible in the past because it would lead to either crash of app or use of large amount of memory.
Even if pretty amount of apps won't benefit a lot from this approach (since they're not doing much on standout output), Node.js, as a runtime, should still pursuit the goal of higher throughput via IO API.

Note I posted this comment last weekend and it's gone, not sure what happened, so I'm posting again (it's slightly different since I've lost the original one).

Answer 39 · 2018-03-28T02:18:49.000Z

@kenny-y iteratively concatenating a large amount of small strings had a significant memory impact last time I checked due to the way strings are stored in V8 (which also makes them faster and more memory effective for regular use-cases which do not deal with iteratively concatenating a large number of short strings).

What is the memory impact of your patch?

Answer 40 · 2018-03-28T06:58:29.000Z

@ChALkeR Yes I did have some experiments done to measure memory stuff, here is the detail:

mem-1.js is to find out the overall memory footprint of 64KiB string cache

// mem-1.js
const SIZE = 128;
const buf = Buffer.alloc(SIZE, 'y\n').toString();

let s = '';
while (s.length < 65536) {
  s += buf;
}

setInterval(() => {
  process.stdout.write(s);
}, 1000);

mem-2.js is to compare with the above one.

// mem-2.js
const SIZE = 128;
const buf = Buffer.alloc(SIZE, 'y\n').toString();

let s = 0;
while (s < 65536) {
  s += 128;
}

setInterval(() => {
  process.stdout.write(buf);
}, 1000);

Overall memory footprint of mem-1.js is 7.3 MiB
Overall memory footprint of mem-2.js is 7.2 MiB
If there are many strings allocated and then gc-ed on the fly, the memory-impact to both should be identical on JavaScript side.

There is a related memory leak bug found (or is it?) in various Node.js version (including 8.10, 9.8, master, ...) when I tried to to reach the maximum throughput with a small buffer or string. The below program can reproduce the leak.

Therefore I can't compare the overall memory footprint in that scenario, because my prototype only takes constant amount of memory, say 37MiB, but the below program will consume 100+ MiB ~ 1.x GiB memory (usually sawtooth shaped but sometimes it stays at 1.x GiB)

// yes-7.1.js
// Memory consumption 100+ ~ 1000+ MiB
const SIZE = 128;
const buf = Buffer.alloc(SIZE, 'y\n'); // memory consumption is the same when adding .toString()

function yes7() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', () => {
        setTimeout(yes7, 0);
        gc(); // requires command line option turned on
      });
      break;
    }
  }
}

yes7();

node yes-7.1.js | pv > /dev/null or just node yes-7.1.js if you're patient.

Answer 41 · 2018-04-02T02:27:15.000Z

@ChALkeR What do you say on this approach? Do you think it's a good idea to change existing behavior or stdout/stderr, or introduce new APIs with proper names suggesting that a buffer is in use?

Answer 42 · 2018-04-20T06:11:29.000Z

I think I've found a memory leak when trying to constantly write a string to process.stdout, the string size is 8 KiB (or less). I'm investigating what's really going on... will open a new bug if it's not related.

P.S. It's not the case for large strings (e.g. 64KiB) -- the "V8 Heap Used" is a constant value.

Answer 43 · 2018-04-24T06:37:23.000Z

@ChALkeR Is there any short term plan to introduce a buffer mechanism in stdout/stderr? I'm interested in this and I'd love to help.

Answer 44 · 2018-05-25T04:21:22.000Z

I am running into this issue, and I am not sure how to solve it. As of now, I am load testing my application, I kick off a series of API calls and slowly scale that up to ~100,000 concurrent connections. I am doing some simple logging inside my function, and that logging is causing my application to crash at ~5,000 connections.

I have tried --max-old-space-size=7168, --optimize_for_size, --stack_size=4096, and an npm package called 'log-buffer'. None of it helps to address the issue. When I use process.stdout._handle.setBlocking(true), it works up until ~5000 connections, and then it deletes my entire log file except for a single chunk at a time, then the application crashes.

When I log the memory usage I am nowhere near the limit for my server.

If I remove all of my logging, everything works (presumably as expected, I don't know for sure as I can't inspect the logs), the application runs to completion without crashing.

Is there anything that I can do to get around this issue?

Answer 45 · 2018-05-25T13:36:39.000Z

deleting the log file and crashing may not be related to memory pileup as described in this issue, but some other problem.
I can investigate, but how are you simulating 10M concurrent connections? are you using multiple systems for clients?
do you have a core file that I can have a look at?

Answer 46 · 2018-05-25T16:51:52.000Z

Hi guys,

I just tried both those programs on node 8.11.0 and 10.1.0 with the program going to the end and no apparent blocking.

// example 1
for (i = 0; i < 1000000; i++) {
  console.log('abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc' + i);
}

For this the only gc traces I have are those:

[65723:0x103800400]      150 ms: Scavenge 7.0 (11.8) -> 5.7 (12.3) MB, 1.7 / 0.0 ms  allocation failure 
[65723:0x103800400]      158 ms: Scavenge 7.4 (12.3) -> 6.4 (16.8) MB, 1.4 / 0.0 ms  allocation failure 
[65723:0x103800400]      168 ms: Scavenge 10.0 (16.8) -> 7.8 (17.3) MB, 1.9 / 0.0 ms  allocation failure 
[65723:0x103800400]      178 ms: Scavenge 10.7 (17.8) -> 8.9 (19.3) MB, 2.6 / 0.0 ms  allocation failure 
[65723:0x103800400]      187 ms: Scavenge 12.1 (19.3) -> 10.1 (28.3) MB, 2.6 / 0.0 ms  allocation failure 
[65723:0x103800400]      210 ms: Scavenge 17.4 (28.3) -> 13.0 (29.3) MB, 3.8 / 0.0 ms  allocation failure 
[65723:0x103800400]      234 ms: Scavenge 18.8 (29.8) -> 15.1 (32.3) MB, 6.4 / 0.0 ms  allocation failure 
[65723:0x103800400]      257 ms: Scavenge 21.5 (32.8) -> 17.5 (50.3) MB, 5.5 / 0.0 ms  allocation failure 
[65723:0x103800400]      308 ms: Scavenge 32.2 (51.3) -> 23.2 (53.8) MB, 8.8 / 0.0 ms  allocation failure 
[65723:0x103800400]      373 ms: Scavenge 36.8 (56.3) -> 28.2 (60.8) MB, 14.0 / 0.0 ms  allocation failure 
[65723:0x103800400]      431 ms: Scavenge 44.6 (64.3) -> 34.4 (66.8) MB, 5.2 / 0.0 ms  allocation failure 
[65723:0x103800400]      453 ms: Mark-sweep 40.6 (67.8) -> 37.4 (70.8) MB, 3.0 / 0.0 ms  (+ 27.4 ms in 408 steps since start of marking, biggest step 3.3 ms, walltime since start of marking 266 ms) finalize incremental marking via stack guard GC in old space requested
[65723:0x103800400]      503 ms: Scavenge 54.4 (74.3) -> 42.8 (76.3) MB, 3.7 / 0.0 ms  allocation failure 
[65723:0x103800400]      542 ms: Scavenge 58.7 (78.8) -> 48.8 (83.3) MB, 4.3 / 0.0 ms  allocation failure 
[65723:0x103800400]      580 ms: Scavenge 64.9 (84.8) -> 54.9 (88.3) MB, 4.4 / 0.0 ms  allocation failure 
[65723:0x103800400]      622 ms: Scavenge 71.0 (91.3) -> 60.9 (95.8) MB, 4.0 / 0.0 ms  allocation failure 
[65723:0x103800400]      661 ms: Scavenge 77.1 (97.3) -> 67.1 (100.8) MB, 4.1 / 0.0 ms  allocation failure 
[65723:0x103800400]      705 ms: Scavenge 83.1 (103.3) -> 73.1 (106.8) MB, 4.5 / 0.0 ms  allocation failure 
[65723:0x103800400]      750 ms: Scavenge 89.2 (109.8) -> 79.2 (113.3) MB, 5.1 / 0.0 ms  allocation failure 
[65723:0x103800400]      795 ms: Scavenge 95.3 (115.8) -> 85.3 (119.3) MB, 4.8 / 0.0 ms  allocation failure 
[65723:0x103800400]      859 ms: Scavenge 101.4 (121.8) -> 91.3 (125.8) MB, 5.9 / 0.0 ms  allocation failure 
[65723:0x103800400]      940 ms: Mark-sweep 106.4 (127.8) -> 99.8 (130.8) MB, 4.1 / 0.0 ms  (+ 61.8 ms in 504 steps since start of marking, biggest step 1.7 ms, walltime since start of marking 179 ms) finalize incremental marking via stack guard GC in old space requested
[65723:0x103800400]      975 ms: Scavenge 112.7 (133.3) -> 102.0 (137.3) MB, 4.9 / 0.0 ms  allocation failure 
[65723:0x103800400]     1021 ms: Scavenge 119.3 (140.3) -> 109.5 (143.3) MB, 5.1 / 0.0 ms  allocation failure 
[65723:0x103800400]     1062 ms: Scavenge 125.3 (146.3) -> 116.2 (150.8) MB, 5.8 / 0.0 ms  allocation failure 
[65723:0x103800400]     1103 ms: Scavenge 132.5 (153.8) -> 123.2 (158.8) MB, 4.5 / 0.0 ms  allocation failure 
[65723:0x103800400]     1146 ms: Scavenge 139.4 (160.8) -> 130.1 (164.8) MB, 6.1 / 0.0 ms  allocation failure 
[65723:0x103800400]     1186 ms: Scavenge 146.3 (167.8) -> 137.1 (171.8) MB, 4.7 / 0.0 ms  allocation failure 
[65723:0x103800400]     1227 ms: Scavenge 153.2 (174.8) -> 144.0 (178.8) MB, 5.5 / 0.0 ms  allocation failure 
[65723:0x103800400]     1268 ms: Scavenge 160.2 (181.8) -> 150.9 (186.3) MB, 5.1 / 0.0 ms  allocation failure 
[65723:0x103800400]     1309 ms: Scavenge 167.1 (188.8) -> 157.9 (193.3) MB, 4.8 / 0.0 ms  allocation failure 
[65723:0x103800400]     1354 ms: Scavenge 174.0 (195.8) -> 164.8 (199.8) MB, 5.9 / 0.0 ms  allocation failure 
[65723:0x103800400]     1401 ms: Scavenge 181.0 (202.8) -> 171.7 (206.8) MB, 5.7 / 0.0 ms  allocation failure 
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc4725
[65723:0x103800400]     2512 ms: Mark-sweep 275.4 (309.6) -> 273.2 (312.6) MB, 156.9 / 0.0 ms  (+ 148.6 ms in 1266 steps since start of marking, biggest step 2.1 ms, walltime since start of marking 1066 ms) allocation failure GC in old space requested

// example 2
var largeData = "SomeData".repeat(1024*1024) + ' ..... END!';


console.log(largeData); //using process.stdout.write here makes no difference either

process.exit();

Does this mean that the issue has been resolved on those versions or am I just getting tricked?

I launched those programs with the --trace_gc flag and didn't see much related to the gc appear.

p.s. : I'm pretty new to memory analysis so I'm sorry if this is silly :\

[EDIT]: The problem only seem to appear on those when piping or redirecting in my shell (either node --trace_gc foo.js > bar.log or node --trace_gc foo.js > bar.log, how can this be?

I'm sorry for this extra question as well:

MarkSweep is a blocking kind of GC
Scavenge is not blocking

Is that correct?

Answer 47 · 2018-05-25T18:09:44.000Z

@gireeshpunathil To clarify, I am scaling up to ~100,000 connections, not 10M. What I am doing is creating a promise that continues to cycle through a set of API calls until it is stopped. I continue to scale up the number of promises being fired until I reach 100,000.

If I remove logging, this method works, it is only when I add logging that it crashes. Does that clarify anything for you or do you still need a code example?

Answer 48 · 2018-05-26T13:34:30.000Z

ok, sorry, 10M vs. 0.1M was my mistake.

your statement If I remove logging, this method works brings back the problem into non-blocking stdio again.

creating a promise that continues to cycle through a set of API calls until it is stopped. - is it possible to show me a skeleton code of this so that I understand it fully?

that way we can confirm we are addressing the same issue as above or not. thanks!

Answer 49 · 2018-06-15T08:08:11.000Z

I would like to discuss one of the problem described in this thread that is relatively more pronounced than others from usage perspective: truncated console log on exit.

can we make the write descriptors blocking? No, it will block main thread: breakage in the asynchronous event driven sequences in other parts of the code

can we make the write descriptors blocking at exit sites? No, it will potentially delay or deny the exit sequences: e.g: descriptor is a socket | pipe that is full, and the recipient is not reading.

does this behavior unique to Node.js? I don't think so, an _exit or even exit call in the middle of a write into a non-blocking fd is expected to behave the same manner for any runtime (managed or unmanaged) with only difference of the pending data in the buffer being flushed in some cases, but that does not help flushing pending data in the app in case of non-blocking writes.

So I propose that it is fair to say the current behavior of console.log at process.exit site is expected: just like a listening socket is teared down in the middle of a connection, or a scheduled function invocation through setTimeout() is abandoned, the truncation of console.log makes sense to me.

But there are genuine use cases which cannot function well with this.

So to bring this to a logical and long term solution, I suggest this:

leave non-blocking behavior of standard streams [1, 2] in tact.
introduce a new api that uses blocking descriptors by design [ say console.logSync ] and well documented on its behavior so as to be used with caution.

Please share your views.

Answer 50 · 2018-06-15T09:17:54.000Z

I see two issues:

Minor: console.logSync is non-standard.
Major: console.log('a'); console.logSync('b') indeterminately prints either a\nb or b\na. I expect that will turn into an endless source of surprise for users.

I don't think there is a general solution, just one-off hacks.

One such hack: add a new uv_flush_handle_sync() function to libuv that tells it to write out the pending data of a set of handles (or just one handle) in blocking fashion without otherwise involving the event loop.

Answer 51 · 2018-06-15T10:37:31.000Z

sure - so if I follow you correctly, do the following at process.exit():

capture the live stream handles (only tcp|pipe|tty) in the event loop.
make them half-duplex (stop receiving data)
capture their associated uv_write_t object, and obtain the data buffer, and descriptor.
tear down the event loop ( without closing the aforesaid fds - how? )
invoke flush method
proceed with the real exit sequences.

Is that how you see it? With that, the proposed method will be a trivial one as below, but the above sequence that Node has to perform may become tricky.

int uv_flush_handle_sync(int fd, char* data, size_t len) {
  int r;
  r = uv__nonblock(fd, 0);
  assert(r == 0);
  r = write(fd, data, len);
  assert(r == len);
  return 0;
}

Please let me know.

Answer 52 · 2018-06-18T21:16:29.000Z

@gireeshpunathil Yes, that's the basic idea. The real implementation would be more involved: take a list of handles, a timeout, possible some flags.

Answer 53 · 2018-10-26T04:27:49.000Z

Seems like this ran its course and can be closed. If I'm wrong about that, don't hesitate to re-open.

Answer 54 · 2018-10-28T06:58:19.000Z

@Trott, really? Nothing has been done, this is not at all resolved.

Answer 55 · 2021-10-19T03:20:59.000Z

The idea that this is "expected" for non blocking streams is fatally flawed: nobody expects logging or stdio to be non blocking, and if you've changed the file descriptor flags, that's an implementation detail that should be hidden from the outside world.

Moreover buffering constitutes an implicit promise that it will eventually be written, so at least at exit all buffers should be fully written out.

Likewise OOM should trigger memory recovery, including writing out buffers.

Delays are less important than broken promises.

Answer 56 · 2022-03-10T13:14:00.000Z

Just posting workaround for this issue for future reference:

for (const stream of [process.stdout, process.stderr]) {
  stream?._handle?.setBlocking?.(true);
}

Before:

$ node -p 'process.stdout.write("x".repeat(5e6)); process.exit()' | wc -c
65536

After:

$ node -p 'for (const stream of [process.stdout, process.stderr]) stream?._handle?.setBlocking?.(true); process.stdout.write("x".repeat(5e6)); process.exit()' | wc -c
5000000

Answer 57 · 2022-03-15T01:02:42.000Z

It seems that the phrase "non blocking" has been used in this discussion to mean something different from the POSIX definition, and that may account for some of the confusion.

In POSIX terms, buffering is what happens within a process, while non blocking and asynchronous are two different ways of interacting with the kernel.

A non blocking (O_NONBLOCK or O_NDELAY) kernel call will fail rather than block; this is certainly not what is envisioned by most participants in this discussion. (Moreover, it will only do so for streams that in principle can block for an arbitrary period, such as terminals, pipes, and network sockets; it does not affect writing to most ordinary files and block devices. )

In contrast, an asynchronous kernel call will neither succeed nor fail, but rather it returns to the running process immediately and subsequently uses a signal to notify the process of the success or failure of the requested operation. That's also not what is being discussed here, but could be useful as an implementation detail for buffering in userspace.

What is wanted here is the equivalent of the C language's setvbuf function: the ability to change the buffer size from "infinite" down to something smaller.

I would strongly urge avoiding using the term "blocking" when defining a new public method to control this, as it unnecessarily muddles the meaning of related terms cf O_NONBLOCK.

Answer 58 · 2022-03-15T04:46:49.000Z

@kurahaupo - Irrespective of whether the usage of the term blocking is consistent with POSIX definition or not (I don't know if that consistency is relevant here or not), its usage in this project's documentation is consistent: a call that effectively blocks the Node.js event loop's forward progress (by virtue of the current thread yielding to the kernel for actively processing I/O). And I guess our users had not much issues with that definition.

Answer 59 · 2022-07-02T15:45:03.000Z

If I understand this issue correctly this is the place to post my feedback. I wrote a Node.js Native Messaging host that streams output from child.stdout.on('data' (data) => {}) to the browser with process.stdout.write(data).

I observed that RSS constantly increases during usage. Perhaps someone in the Node.js realm can explain why this occurs and how to fix this memory leak https://thankful-butter-handball.glitch.me/.

Answer 60 · 2023-04-05T08:34:31.000Z

This issue has been open for nearly 7 years. In the mean time mitigating work has been done and lots of discussion took place in other issues. I'm going to put this one out to pasture.

Answer 61 · 2023-05-26T12:05:57.000Z

For people that are looking for a workarounds, many are given here: https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe.

The one I choose was https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe/25378#25378, which is to wrap the command you want to execute with the 'script' command, which has a way to unbuffer the stdout.

script -q /dev/null long_running_command | print_progress      # (FreeBSD, Mac OS X)
script -q -c "long_running_command" /dev/null | print_progress # (Linux)

All the other solutions in that linked thread required installing a new program, but the script command is available on most POSIX-ish systems, albeit with different cmd line arg styles. This works for me - hopefully it can work for you.