Feature: Support error causes
conartist6 opened this issue ยท 33 comments
This is a proposal to support printing stacks of errors nested with error.cause
. The JS spec allows error.cause
to be any value, but this proposal is mainly concerned with what happens when cause
is present as a string or another error.
I've written up the most descriptive form of this proposal as a proposed jest feature. If you don't mind I'll avoid copying it all out here. Jest uses stack-utils
under the hood to reformat errors.
The TC39 committee has so far indicated that the language will not provide any official implementation of printing nested errors. Their hands are tied to some extent until they finish specifying the semantics of error.stack
. But until then it will be very useful to have best-effort printing and formatting for errors printed in the most common way.
I think errors with causes should be printed more or less like this:
SomeError: The grumblecakes could not be found
at grumblecake-locator.js:12:2
Caused by: QueryError: Invalid query `SELECT <grumblecakes>`
at db.js:51:4
Caused by: ParseError: Unexpected character `<` at position 8
at db.js:1233:10
Currently there is no standard for this. I think stack-utils
would be an excellent library in which to create an implementation which formats, prints, and loosely parses errors with multiple messages and stack traces.
I'm willing to build it. Is there interest from the maintainers?
I've adapted the project's current regex into a lexer that I think will work:
const moo = require("moo");
let lexer = moo.compile({
Whitespace: { match: /[ \t\n]/, lineBreaks: true },
CausedBy: /^\s*Cause(?:d by)?:/,
// Any line ending with file:line:col is a stack frame
// file:line:col may be wrapped in parens, or may be the literal 'native'
StackFrame: /^.*?(?:native|:\d+:\d+|\((?:.+?:\d+:\d+|native)\))$/,
FreeText: /.+$/,
});
Stack frame lines would be subject to further parsing. In particular the current big stack-frame-line regex will need to be broken down in order to support embedding URI syntax and fix #60.
Here's a condensed summary of the tokens emitted from that lexer when run on the example input above:
[
{
type: 'FreeText',
text: 'SomeError: The grumblecakes could not be found',
},
{
type: 'Whitespace',
text: '\n',
},
{
type: 'StackFrame',
text: 'at grumblecake-locator.js:12:2',
},
{
type: 'Whitespace',
text: '\n',
},
{
type: 'CausedBy',
text: 'Caused by:',
},
{
type: 'Whitespace',
text: ' ',
},
{
type: 'FreeText',
text: 'QueryError: Invalid query `SELECT <grumblecakes>`',
},
{
type: 'Whitespace',
text: '\n',
},
{
type: 'StackFrame',
text: 'at db.js:51:4',
},
{
type: 'Whitespace',
text: '\n',
},
{
type: 'CausedBy',
text: 'Caused by:',
},
{
type: 'Whitespace',
text: ' ',
},
{
type: 'FreeText',
text: 'ParseError: Unexpected character `<` at position 8',
},
{
type: 'Whitespace',
text: '\n',
},
{
type: 'StackFrame',
text: 'at db.js:1233:10',
}
]
Oh boy this project actually already supports "nesting" of error stacks through string concatenation as done by nested-error-stacks. Unfortunately the behavior of nested-error-stacks is fundamentally not interoperable with errors that have causes. After all, to print errors stacked using cause you need access to their unstacked stack traces, which are destroyed by nested-error-stacks.
In practice nested-error-stacks
outputs the same the same Caused by:
string that I'm now working on parsing, so the only real difference would be in printing. If err.cause
is present, then correctness of the output relies on err.stack
not containing the same nested traces. The good news is that nobody should be doing this. Thus it may be fine to just print everything and let it come out in the wash.
Oh and I think that answers another question I had. I was wondering why the at
prefix at the beginning of a stack trace line is not required, and that's the reason (edit: or part of it -- cleaning a trace with this tool can also have that result). Because the parsed format essentially mirrors the trickery that nested-error-stacks
does that's how it keeps track of what's a nested cause, which must then be outdented.
It was my intention anyway, but I'd say my plan is to make breaking changes and publish my fork. Once I have something that works and I can actively maintain we can discuss whether the fork should merge back into this package.
Hmm, clean
is pretty explicit though:
if (!(/^\s*at /.test(stack[0])) && (/^\s*at /.test(stack[1]))) {
stack = stack.slice(1);
}
If a line doesn't begin with at
it isn't part of the trace. This looks a little off to me though -- the initial message could be more than one line long.
I'm very tempted to rip out support for method and file names which support (
in my fork. I recognize that the code in this project handles quite a lot of cases, but also the way it does it is terribly convoluted. I think it's a heck of a lot easier to just explain: "hey, if you're going to parse your errors, the results aren't going to be good for a file named )(.js
.
Similarly I think it's probably safe to assume that most methods names won't have ()
in them. I've never seen that professionally. Of course it's trivial to create such a thing, but that doesn't mean it's not fine to have less than ideal behavior when it is encountered. Since file names are usually allowed to contain (
and )
and identifiers are not, I think it's likely better practice for the parser to give some mangled results when the function name contains parens. And we can soften the cases which fail: we're sure that the parens around a file name will be preceded by a space.
That would allow us to handle some pretty weird stuff without doing anything all that weird, e.g.:
const obj = {
'weird fn name()': () => {
throw new Error('message');
},
};
obj['weird fn name()']();
which results in
Error: message
at Object.weird fn name() (/home/conartist6/dev/repos/stack-utils/space file.js:3:11)
at Object.<anonymous> (/home/conartist6/dev/repos/stack-utils/space file.js:7:23)
I could also stipulate that results may be bad if file names contained mismatched parens. Then I could just go back from the end of the frame string until I find the nearest balanced paren. I'd slice that part out and feed it back into a path parser that could differentiate between URIs and different kinds of paths.
I see from testing that windows and unix path styles are sometimes mixed in a single trace, which is just a pain.
at Object.<anonymous> (W:\home\conartist6\dev\repos\stack-utils\space file.js:7:23)
at Module._compile (internal/modules/cjs/loader.js:689:30)
It may be that the easiest way to deal with this is to always tokenize parens. This will sometimes split apart parts of a single token like Object.weird fn name()
that really belong together, but I then the parser will be able to handle balancing of parens and make the best possible effort to match the basic structure of a line (at method (file:1:2)
), which will just leave us to find any paren tokens that were not consumed and print them back out as literals. If file is defined as being acceptable when having balanced parens, it should work fine.
My new lexer is:
let lexer = moo.states({
error: {
Newline: { match: "\n", lineBreaks: true },
Whitespace: /[ \t]/,
CausedBy: { match: /^\s*[cC]ause(?:d [bB]y)?:/, value: () => "Caused by:" },
At: { match: /^[ \t]*at /, push: "frame" },
// Any line ending with file:line:col is a stack frame
// file:line:col may be wrapped in parens, or may be the literal 'native'
FreeText: /.+$/,
},
frame: {
Newline: { match: "\n", lineBreaks: true, pop: true },
"(": "(",
")": ")",
Whitespace: /[ \t]/,
FrameFragment: /[^() \t\n]+/,
},
});
And here is a complete parser. It's still missing some of the more minor features, but it proves that everything works together. Try pasting it into the nearley playground
Aight I'm out for the day. Signing off. No more work. Thanks for your help everyone! Errors are gonna be great!
In the end everything I was afraid I might not support works great. This parser will be even more flexible and powerful than the current one.
Ah ok now we get into more fun stuff. The more we add into these stack lines the more we're going to hit a very particular problem: the grammar is fundamentally ambiguous. foo [as bar]
could conceivably be either a reference to a method called foo [as bar]
, or of course much more likely it is a function foo
used as a method bar
. Nearley will output both interpretations, which on one hand is great, but on the other hand there might be a lot of these lines, and allowing ambiguities to compound (exponentially) would be disastrous.
I think what this means is that I need two parsers, one for errors, and one for lines. That way the line parser can be ambiguous by design but for each line we'll eliminate any ongoing ambiguity.
The only question is how much overlap should there be? If stack lines are any lines that begin with at
, there need be little overlap. But if we aim to support stacks were the at prefixes have been stripped away then we need to keep enough of the frame parsing to identify (file:line:col)
at the end -- something we already know we are able to do unambiguously.
Do we still want a scanner for the frame parser? I could certainly simplify the frame parser code a lot if just I trashed it. Is it doing anything for me? I suppose since the frame parser is fundamentally ambiguous it might help with speed. When you're doing a permutations problem it helps to have less stuff to permute. I guess I'll leave the tokenizer for now, but maybe once this is all working I'll put together a benchmark and see what happens to perf it I toss it.
I'm happy to report that it's all coming together. My new implementation now supports all the syntax that this parser supports as well as some new things like async
function prefixes and <anonymous>
as a location (anonymous generators output this). Working on getting the test suite to pass, and I need the to consider how to present the differences between paths and URIs.
The new parseFrame
now passes the old test for parseLine
. I see that in addition to nested-error-stacks, there is more general support for freetext frames such as
Error:
at outer (produce-long-stack-traces.js:7:27)
From previous event:
at Object.<anonymous> (produce-long-stack-traces.js:6:36)
My initial idea was to parse into a structure like this:
type StackFrame = {
type: 'Frame';
file: string;
function?: string;
// ...
} | {
type: 'Text';
text: string;
};
type ParsedError = {
name: string;
message: string;
stack: Array<StackFrame>;
cause: ParsedError;
};
The weird thing with this design is that causes are from a syntactic perspective a subset of freetext stack frames, yet this structure treats them in completely different ways. The other drawback is that it's actually fairly valuable I think to have the ability to reverse iterate the cause stack, i.e. going from most specific to least specific as error handling does in real code. Essentially chaining cause through a cause
property is creating a singly linked list, where an array would more easily permit reverse iteration.
If I do use an array I'd probably need a new type to store it in. I'd have something like:
type ErrorChain = {
chain: Array<Error>;
};
type Error = {
name: string;
message: string;
stack: Array<StackFrame>;
isCause: boolean;
};
This could work much better because there'd be no more need for "freetext frames". Instead the freetext would just become the message in a new error, and the stack would always be guaranteed to be an array of frames, which seems useful from the perspective of writing code to filter out spurious frames.
I think the combined advantages of the chain approach are worth making it my new design.
Hmm thinking about this some more and I don't like isCause
. Mainly the reason we need to understand it is so that Caused by:
doesn't show up as part of the message
for the next error in the chain.
To generalize that functionality I could:
- ...make the tool configurable, so that you could feed it a list of string to treat as not-part-of-message.
- ...split out anything before the first
:
in a message and store it in a property. The first message for the top stack frame would not do this.
I'm tempted to do the second thing. It would be easier to implement, test, and document, and it does all it needs to do. The purpose of this tool has never been (in my mind) to "understand" errors. Someone could build that on top. This just needs to be able to tweak and reprint.
Completely revamping the representation of stack to make it not-megamorphic and easier to document/extend.
Well I've got pretty much a full implementation of the ambiguous parser approach, but while I initially thought there would be a little ambiguity, now that I've implemented everything it's clear that there's a LOT. I'm on the fence about what really makes sense to do in this case. The only way to cut down on the ambiguity really is to cut down on the stupid stuff that we currently make our best effort to handle. One option might be to have two versions of the parser, one with no ambiguity (or significantly reduced ambiguity) that can be run as a first pass. In the common case this would be all that is needed. But if that fails, we could incur the cost of running a more permissive parser.
The main question that arises from this is: is it even worth having the backup parser? Well, that depends on what it costs. The nearley API allows me to specify a rule as ParserStart
. By default it's the top production, but I could create a strict and a loose top production perhaps in such a way that lower level productions could be reused (to save space). I could possibly save some runtime by avoiding repeating the lexing step, though there's no obvious API for doing that. If these optimizations were accomplished robustly, they would seem to provide the best of all worlds.
Yeah I think that'll work. I can reuse most of the parser. I'll factor out the postProcesser
(node building) code into reusable chunks too.
Also it might be pointless to save the lexer's output. It takes up resources in the common case, and in the less common case of real ambiguity lexing again will still likely be very fast compared to the cost of the ambiguous parsing.
Ah and most of the rules would need to change, so the optimizations are bunk. But even so, I think the two parser method is probably still worth it. I think the average frame results in at least seven ambiguous parsings internally, which stem from the format of Site
, which is usually seven tokens: (
, path
, :
, digits
, :
, digits
, )
. Each of those tokens is ambiguous as it could also be part of the freeform function name which immediately precedes it. Only once the parser has proceeded through to the end of that token stream (tracking several possible alternatives) does it realize that 6 of the 7 alternatives fail to have a valid Site
. Thus I think there is still significant additional savings by creating a first-pass strict frame parser which doesn't have that problem.
K, got that working. Time to fix the test suite again. You know at one point all the tests in the suite passed, and I honestly have no idea why cause there were definite problems with the parser.
I need to think about how to handle spaces. Right now I've treated spaces as JSON does for example: collapse multiple spaces down into one token and basically ignore spaces anywhere that isn't inside a literal. Of here it's a bit fuzzier exactly what a literal is, but anyway. The current codebase tests this case:
at f[i](l<e>:.js:1:2) :2:1`
In the current parser the file part is expected to be f[i](l<e>:.js:1:2)
. This is only possible to deduce if you are absolutely sure about the way the printing code handled spaces, which seems like an unnecessarily strong assumption. If you let the leading and trailing spaces get trimmed away you lose support for filenames that begin or end with space. Why not though! If anyone is making files with names like that, so what if they have to trim the real filename before they can compare it to the one in the parsed stack? So yeah what I lay the case out like that, I think it's fine to say that this behavior changed in my implementation.
I've added tests which validate that I don't fall out of the strict parser's grammar for frames that don't do anything weird. It's already caught some issues. To be able to write the test I had to introduce a strict
option to parse, which seems like probably a good idea anyway.
Oh god I'm testing my work on stack-utils using AVA which formats its errors with stack-utils. I tried to stop the test on an uncaught exception and I was confused as to why the failing code was dealing with stack-utils but it didn't seem to be the one I was working on. In fact I've just discovered yet another deficiency in the current stack-utils: it can't parse the frame at async Promise.all (index 0)
which is making its own test runner blow up. >_<
I can't figure out if the first eval
in a frame that reads at eval (eval at ...
is ever any text other than eval
. I don't see any counterexamples, and without any I don't know what to call it.
It has come to my attention that this is a parser for v8 stack traces, i.e. node, Chromium, Chrome, and I guess now IE Edge. I think this answers the question of what I'll do when I'm done with this work. I don't really want to have to ask permission to publish something that I wrote, but I also don't want to have to name the package something silly. So I can be v8-stack-utils
.
Site will now have a type
, one of 'anonymous'
, 'native'
, 'file'
, 'uri'
, or 'index'
.
I'm really getting towards the finish line now. I think I'm content with the API I've created, and I'm working on the test suite.
If you read this far I commend you! If you didn't read everything above I don't commend you, but it's OK I was just talking to myself. I started describing a feature I wanted and ended up building a whole replacement library. It's called stack-tools
. Check it out!