nodejs/diagnostics

Expectation about tier of support from diagnostic tools and VMs

joyeecheung opened this issue ยท 19 comments

Action items:

  1. Look for feedback from users about their dependencies on existing diagnostic tools (e.g. do they block an upgrade of Node.js because tool X stops working on newer version of Node.js)
  2. Look for feedback from the developers of the diagnostic tools about their support for Node.js updates (e.g. if there is going to be a change in Node.js that is going to break a tool, are they willing to support it and how long would that take)
  3. Look for feedback from VM vendors about their support for the development of diagnostic tools (e.g. if they are going to break an existing tool, are they willing to provide support to work with the tools and unbreak those, what level of priority would that be)
  4. Create a table (or tables) about different tier of support among different category of tools, Node.js core and the VM vendors

I made a non-exhaustive list of tools we have today (either inside Node.js core, in the VM or as external tools). If I forgot anything, feel free to edit the list and include it.

Tracing

OS/External Tracing Tools

  • DTrace
  • LTTng
  • ETW
  • SystemTap
  • eBPF tracing tool

Other

  • async_hooks
  • V8 trace_events
  • Node.js trace_events

Profiling

V8

  • CPUProfiler
  • HeapProfiler
  • SamplingHeapProfiler

External Stack Samplers

  • Linux perf
  • eBPF profile tool
  • DTrace
  • Windows xperf

Heap and Memory Analysis

  • mdb_v8
  • node-heapdump
  • Chrome DevTools
  • llnode

Step Debugging

  • Chrome Debugging Protocol
  • V8 Debugging Protocol
  • ChakraCore Time Travel Debugging

Other tools

  • node-report
  • 0x
  • node-clinic
  • eBPF node-specific tools (nodegc and nodestat)

Diagnostics WorkGroup Deep Dive Meeting on Expected Support Tiers - 2018-??-??

Time

TBD

Agenda

Brainstorm of available diagnostic tools

Before discussing support tiers for diagnostic tools, we need to know which
tools we have today and how stable they are.

Questions we need to answer

  • Which relevant diagnostic tools we have today in the Node.js ecosystem.

Note: we might want to start a poll to gather feedback from users about this.
Maybe we should reach out to the User Feedback WG?

Support tiers for tools outside Node.js core

Most diagnostic tools we have today exist outside of Node.js core, which means
it's harder for us to keep them updated and working, even if they're under the
Node.js Foundation (i.e., llnode).

Questions we need to answer

  • Should we have test suites in place for those tools?
    • If so, should they be integrated into our current test suite or should they
      be part of CITGM? (or even something else)
  • Should we move some of these tools into core? (we're already doing it for
    node-report).

Node.js/V8 Support Tiers

Some tools rely on V8's non-supported APIs and features to work (i.e., Linux
perf, llnode, DTrace, 0x with --prof, etc.). Those tools are relevant in Node.js
context because they usually offer greater granularity and less overhead when
diagnosing an application (qualities important for i.e., servers but not so
relevant to browsers).

Questions we need to answer

  • How to handle breakage of these tools?
    • Should we block V8 updates if some of these tools break?
    • Should we provide built-in alternatives to these tools?

Note: another big difference between Node.js and browser is the presence of
Native Modules on Node.js.

Discuss possible support tiers for diagnostic tools

Define support tiers with their respective expectations

Questions we need to answer
  • Which support tiers should we provide?
  • What are the expectations for each support tier?

Define criteria for tools to go under a specific support tier

Questions we need to answer
  • What are the criteria for a tool to be under a specific support tier?
    • Tests? Number of contributors? Dependency on unsupported V8 APIs?

Invited

  • Diagnostics team: @nodejs/diagnostics

Should we discuss this during the upcoming collaboration summit? There's already a session booked for diagnostics. cc @mhdawson

Otherwise, we (Netflix) would be happy to host a diagnostic summit at our offices in Los Gatos, CA at some point.

I think we should add to the topics we discuss at the collaborator's summit. It's also good to start planning for the next summit dedicated to diagnostics. The only thing I'd add on that front is that there was some discussion that maybe the next one should be in Europe?

I can look into having nearForm host in Ireland if y'all would like :) maybe around NCEU so we can consolidate travel.

If not nearForm hosting, I think it would be awesome if the V8 team host at Google, Munich /cc @hashseed @bmeurer

I like the idea of consolidating travel with NCEU.

@mhdawson Given the conversation from the collaborator's summit, can we start to get alignment on tiers of support for the toolchain that @mmarchini has listed here?

@yunong lets start with a list of the tools and then we can build the matrix which answers the questions above for each tool.

@mhdawson Is this something we can discuss at the next WG meeting?

Sure, adding the agenda tag.

lets start with a list of the tools and then we can build the matrix which answers the questions above for each tool.

@mhdawson I believe we can use the list above as a starting point.

I put together this gist as a starting point for what we might PR into the project somewhere:

https://gist.github.com/mhdawson/ead071dde71f71ae5af11ccbaca4f1ec

Please take a look and maybe we can discuss in the next WG meeting.

I'm somewhat tempted to add something along the lines of "gathering vs. visualization/processing vs. 'online'" to the table. But I'm not sure yet what exactly that would mean. But I think the table as-is would be valuable to bring into the project.

@jkrems, I had similar thoughts in that its important that we don't break the generation of the data needed by externals tools, but a bit less so if the data is available and the tools themselves need a fix. If you can think of a good way to fit it in let me know/

PR based on tiers discussion in last Diagnostics meeting: nodejs/node#21870

should this remain open? [ I am trying to chase dormant issues to closure ]

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.