nodejs/post-mortem

Proposal for Common Heap Dump Format

yunong opened this issue · 32 comments

From the meeting on 1/28/16, the group aligned on producing a standardized common heap dump format. The current V8 heap dump format is a good starting point, but it has a few drawbacks. @davepacheco has volunteered to take a stab at a proposed common file format.

For my curiosity, can you list the drawbacks?

@bnoordhuis take a look in the meeting minutes: #12 It's captured there.

Thanks, @rnchamberlain. I'll take a look at those.

@bnoordhuis: the proposal will certainly include a more complete discussion. Of the several drawbacks mentioned on the call, I'm not sure about all of them because I haven't worked closely enough with the format yet, so please correct if these are wrong:

  • it appears to require O(N) memory usage to construct (at the very least, to assign ids to nodes and edges) and to parse
  • it does not include any values of primitive types besides strings (numbers, booleans, null, undefined)
  • it does not include references to native values directly (which is a fine choice for its use-case, but makes it less useful for debuggers that want to combine JavaScript and native state, as you might want in order to print out a Node stream and its associated libuv structure and file descriptor)
  • it does not appear to have a place to put the main thread stack, which is important for debugging crashes

It also does not appear to have any complete, canonical documentation outside source code, which isn't a drawback of the format, but makes it a little harder to answer these questions.

Of course, the major pro is that it's supported by a number of existing tools. Among the design goals of any new intermediate format are that it be possible to translate into a V8 heap dump, even if that means losing some of the information that's available in the intermediate format.

Any thoughts on the best way to submit a proposal? Should it be a PR to add a file to this repo?

PR would probably make sense.

Agreed a PR would we a good place to review and have the discussion. I'd suggest we use something like

docs/working/common-heapdump-format

as the location

and then once there is agreement on the format promote it to docs/

This would make it obvious to any observers that its still a work in progress.

Sounds good.

I'd wanted to clean this up more before submitting it, but it's been long enough that I wanted to put what I've got out there. #19 is a PR for my current draft. Let's keep the discussion in this issue.

Should we plan another meeting to discuss this topic? I think that will be a nice forcing function to get folks to start reading and reviewing @davepacheco's hard work.

Dave, many thanks for a great start on this. A few initial comments:

  1. Is extending the existing V8 JSON format impossible? You mentioned that it comprises a single large JSON object.
  2. We are observing very high native memory usage when using the V8 HeapProfile TakeHeapSnapshot() API. Do you think this is a fundamental problem with the existing format (you mention "use of monotonic ids assigned to nodes and edges makes it difficult to generate without additional O(N) memory usage") - or is it perhaps something else?
  3. For javascript stacks do you expect to add more top-level types for stack frames etc? Also maybe other interesting things like pending callbacks.
  4. Having separate records for nodes and edges looks nice, but I worry that it will make the format more verbose than necessary - compared with the alternative of including a set of outbound references in each node record.

Yes, scheduling next meeting sounds like a good plan

@davepacheco @yunong where shall we share feedbacks and questions on Dave's hard work?

bmeck commented

@rnchamberlain

  1. the v8 format is ill suited for serializing in O(n) since it does a gather phase then serialize after everything has been buffered (this is reflected by the design of the JSON).
  2. the gather phase is one of the large reasons for the high memory usage
  3. The heap format does have references to async listeners, but it needs more types to express things yes.
  4. v8's does keep them separate, however I don't think this should be a hard requirement as it serves to slow down serialization to have separate lists

Thanks @bmeck for the details on the V8 heap dump. That reinforces my feeling that it's less than ideal to have to gather (and store) O(n) memory ahead of time in order to serialize the dump.

As for JS stack frame types: I think it's like object types (JSObject, JSArray, JSDate, JSRegExp, and so on). In practice there's a small, fairly fixed number of types. But every once in a while we add support for a new type, so it's useful if it's possible to extend it. For example, mdb_v8 only recently gained support for printing out JSRegExps. Similarly, I could imagine us adding richer support for ConstructorFrames or EntryFrames in the future, if that became important.

I'd love to add pending callbacks, although this might be better inferred by the consumer of the format from the contents of the dump. I'm not sure.

The reason I don't like adding outbound edges to the node records themselves is that it further constraints the emitter -- it has to have enumerated all of the outbound edges before it can emit the next node. Maybe that's always easy, but I'm not sure about that.

Thanks everybody for the feedback, and I look forward to the rest of the discussion.

bmeck commented

@davepacheco maybe we could be more clear about the problem with adding them to the module record itself, I will try to explain in the simplest case:

node id=1
  edge -> 2
  edge -> 3
node 2
  edge -> 1
node 3
  edge -> 1

In this case we dump 2 edges, and then walk the edges. We can save a little bit of space by not dumping what node we are on into the edge.

In the following we dump only when we visit:

node id=1
  1-> edge -> 2
node 2
  2 -> edge -> 1
  1 -> edge -> 3
node 3
  3 -> edge -> 1

But now we are dumping the source node for each, which is wasteful

node id=1 edges=2
  edge -> 2
node 2 edges=1
  edge -> 1
  edge -> 3
node 3 edges=1
  edge -> 1

Is a compromise that does not invoke 2 loops of the edges, and it is less wasteful than having all of the edges list their source node.

@davepacheco - I'm assuming your format is intended to work with the existing v8 Snapshot API, as another serialization format, is that correct? In which case I can see that it might be easier to dump Nodes and Edges separately since v8 has already done the (significant) work to split them up. The problem with adding Edges to Nodes is it makes it hard to access the resulting file randomly - all nodes aren't the same size and generating an index is more work.

Changing the v8 APIs to do a linear walk of the heap from top to bottom dumping Nodes and their Edges as it goes is probably outside the scope of this issue. It would be faster and require almost no extra storage (particularly if it streamed directly) but the output would require post-processing to produce either this format or the existing JSON format of heap dump.

I kind of lean towards having the dump file be as fast as possible to create and then sorting everything else out with post processing offline as that's probably best for taking heap dumps in production. However given that v8 has done the work in the Snapshot API already we might as well make use of it and leave the Nodes and Edges split.

@hhellyer The intention was not specifically to support the existing V8 snapshot API. Rather, the main use-cases for generating this format are: (1) from a core file via mdb_v8, (2) from IDDE, and (3) from a V8 heap dump. Each of those emitters is pretty different, though, so I think you're right that we're best off making it relatively easy to generate the format.

The draft leaves open the physical format of the file. As a sequence of records emitted in order, it would not provide random access, but as a sqlite file, consumers could use the file to find arbitrary objects without requiring linear time to load the file.

@bmeck I see what you're saying, and if we go with the linear format (instead of sqlite, say), I think that's fine. Your example assumes that the emitter is walking the heap structure, but mdb_v8 doesn't do that because it doesn't know where the roots are. Instead, it walks the address space linearly, attempts to prune values that are not valid objects, and then emits what it finds. It doesn't follow the edges it finds -- rather, it's either already visited them, or it will visit them later. Given that, the case where your suggestion wouldn't work is when we don't know when we visit node N that it has a certain edge, but we discover it later (i.e., if there's a kind of edge whose in-memory representation in V8 points to its source node, rather than the source node pointing to the edge). The only edges I know about today represent object properties, array elements, and closure variables, and in all of these cases, the node itself points to the edge, so this is probably not be a problem.

@davepacheco The core dump -> new heap dump and old V8 heap dump -> new heap dump use-cases are important, but i think we also need to plan for direct output from Node/V8 to the new heap dump format. As @hhellyer says, if the mechanism for that can be a single pass linear walk we think it will be faster and lower footprint - which is what we need for rapid close-down and restart on out-of-memory events. As the mdb_v8 algorithm is also linear walk it seems to point to combining most of the edges into the nodes (i.e. a set of outbound references in each node record). We may need separate edge records for roots though.

@rnchamberlain I'm all for supporting that, but I don't know of anybody intending to use it, so I don't know what additional requirements it would impose. Would IDDE make use of that workflow?

I think the question on combining edges with nodes is a bit of a red herring. Anything capable of emitting edges alongside each node is clearly capable of emitting edges that also include the node identifier. mdb_v8 doesn't know anything about roots, unfortunately. Are we just worried about the extra space used to include the node identifiers with each edge?

bmeck commented

@davepacheco I think speed is a bigger concern than space for me personally. The faster the dump the less disruptive to service it is. A few MB for the extra id is fine as long as speed and amount of data are sufficient.

A PD workflow I think we need is Node/V8 -> new heap dump -> chrome dev tool profiler (as well as the option to produce a new heap dump from a core dump). Assumes we'll need to add a parser for the new format in the dev tool heap profiler, is that feasible?

IDDE doesn't have the dominator tree/retained set algorithm that's needed for easy analysis of heap usage and leaks.

@rnchamberlain It sounds like you're suggesting broadening the scope of the format to fit into some of the same use cases that the V8 heap dump format currently supports. I think that would be great, but I had not been considering those as primary use cases. If people are interested in implementing those components, that's great.

I don't think there's any major decision decision about the format hinging on this question, so this may all be moot at this point.

@davepacheco want to start by saying great job on the initial doc. I'd agree with the earlier comments that being able to generate the dump quickly to avoid impact to the production environment is important. I'd also vote for a simple format that can be post-processed when needed to accelerate analysis.

Is speed really an issue? We can always generate a OS core dump quickly, and produce tooling to convert the core dump to this format. It seems like if you want speed, the core dump is what you should be using anyway.

bmeck commented

@yunong Core dumps are super complex as a starting point for what we need to consume. Maybe you know of a good library for working on dumps in a programatic way? LLDB and MDB are scriptable, but it would not be pleasant to get them to generate data to a file.

I agree with @yunong as core dumps are anyhow what can make the post-mortem experience and result efficient.

@yunong @lucamaraschi thanks for the input, that makes a difference. Actually it's the way things went with IBM Java. Operations often preferred to produce core dumps. Also the heap dump formats stagnated so analysis tools ended up getting more insight from core dumps. The problem is the tools investment needed to extract the data from core dumps (as @bmeck says).

@davepacheco I have just a couple of questions/thoughts:

  • How to you see this format being used for different javascript engines?
  • I like the idea of a graph storage for post-processing...but shouldn't we leave it up to the consumer?

I might be off but I think we should think of this format as a protocol (and maybe was the intention of the document ;-))

@yunong shouldn't we summon a quick "get together" meeting to see how to move from here and maybe start producing some workable solution?

@lucamaraschi It will be hard to know how well this format supports other JavaScript engines without input from someone who has developed similar postmortem debugging tooling for one of those other engines. Some of the aspects of this format really assume the V8 internal representation, and the document explicitly calls that out:

The representation of JavaScript values should reflect the well-established
internal V8 structures (e.g., SMIs vs. other heap values). This unfortunately
couples the format to V8's implementation, but we believe this is necessary to
avoid leaky abstractions and to obtain an accurate view of memory usage and
the relationships between objects.

To elaborate on that: it's possible for two different JavaScript engines to differ greatly in the way references are tracked between objects, and this could affect the memory utilization of different programs. For example, as I understand it, V8 closures hang onto any values in scope when the closure was created that any other closures created in the same scope might also reference. That doesn't have to be true for a different engine. In order to present information about which objects (or closures) are hanging on to which other objects (or closures), the format needs to preserve something about engine-specific representations.

Besides that, if we genericized the format, I don't know how we could know that we'd done so correctly without someone who was actively going to use it for a different engine. So I suspect this format is fairly V8-specific, but I think that's appropriate.

On your second question of graph storage post-processing: certainly the primary goal is to allow the consumer to do that, but if there's no downside to storing the representation in a database with enough structure for random-access (which may even be substantially simpler than the alternative), that may be worthwhile.

@davepacheco I understand but isn't it possible to then separate what we think/predict/assume it's part of a common specification and what else is completely specific of a particular engine? I agree with you when saying that

this format is fairly V8-specific, but I think that's appropriate

but I am trying to go a little bit beyond this line and see if we can drive a collaborative effort maybe with the peeps that are working on Chakra.

On the point of the storage I completely agree with you as we should more think about the API and protocol specification more than the implementation details.

I wanted to restart this thread a bit with some thoughts based on our experience with JavaScript heap snapshots in ChakraCore (main files for serialized representation here and extraction here). The original motivation for this work was for the record/replay code underlying the time-travel debugging and production diagnostics functionality. In these scenarios the cost and size of extracting the snapshots was critical and we have been able to bring these costs to very low levels e.g., running TechEmpower in the json config a snapshot takes 30ms and compresses to disk a human readable format in 2MB here. The resulting snapshot contains enough information to inflate the program state and re-execute the JavaScript code which makes it possible to load the snapshot and inspect it with a regular debugger. Additionally, we have worked to make the snapshots as independant of the underlying binary layouts and JavaScript engines as possible.

Going forward we wanted to investigate using this format (and associated techniques) as a general diagnostics snapshot format. Key issues that we would like to investigate going forward are:

  • Extending the format to include key Node.js state such as:

    • Pending callbacks.
    • Memory consumption details.
    • etc.
  • Generalizing the format to support v8 as well as ChakraCore:

    • What can be generalized.
    • What needs to be split out into engine specific components.

I am currently working on implementating a basic JS parser and graph library for post-processing of the snapshots. I am planning to use this library for memory diagnostics and program invariant generation tools. However, it would be great if we can build this out in a way that is useful to the wider community as well for diagnostics and other applications.

Perhaps there is an opportunity to integrate low cost JS heap snapshots with something like node-report to get both native and managed information.

Closing due to inactivity. If this work still needs to be tracked, please open a new issue over in https://github.com/nodejs/diagnostics.