tapjs/tap-parser

Writing a more detailed specification for TAP in node

Closed this issue · 17 comments

Currently the TAP specification defines a bunch of semantics but it's missing a few things

  • What are comments and how do they work
  • What is inside yaml blocks

It would be great to define more semantics so we can write smarter tooling.

My personal interests in these a spec are two fold

As a first step I would like to talk about merging tap-parser and tap-out ( https://github.com/scottcorgan/tap-out ). I've been using tap-out as it has better heuristics for understanding tape output ( https://github.com/Raynos/itape/blob/master/context/index.js#L7 ).

I don't actually know what's necessary to merge the two and would have to look into it in more detail.

cc @substack @isaacs @scottcorgan

+1. Not sure if we should create a separate GH org or repo for this, but here's the general thinking behind the current tap-parser behavior:

  1. Start with the current TAP specification version 13
  2. Add Test::More style indented subtests. This is logically simpler for representing subtests than juggling the plan and test numbers weirdly. It should be more explicitly specified so that you MUST have the # Subtest: <test name> comment as the first thing in the child test, and MUST have the ok - <number> <test name> test point at the parent level at the end of the child test.
  3. Add # time=<number>ms as a comment to say how long a child test took to complete.
  4. Flat-out refuse to add any features that are not handled well by vanilla TAP 13 parsers (which will ignore indented subtests).
  5. Diagnostic blocks are actual YAML, not "yamlish".
  6. Comments can appear anywhere.
  7. Non-TAP data is non-fatal.
  8. Child tests are indented 4 spaces. Yaml diags are indented 2 spaces. We should specify that explicitly.

I'd love to get to a full and proper spec and just bless it as "The JavaScript TAP Specification" or something. If we could keep the featureset small enough to only include features that an existing test framework (eg, mocha, lab, tap, tape, etc.) actually depends on, that'd be even better.

Oh, last but not least, a plan or bailout is ALWAYS required. So, console.log('ok') is no longer sufficient TAP output. You'd have to do console.log('ok\n1..1') at least.

This change came out of a pet peeve of mine that sometimes when using node-tap 0.x, I'd have a test that aborted without actually completing all of its tests sometimes, and I shipped modules with silently failing tests on more than one occasion.

Oh, last but not least, a plan or bailout is ALWAYS required. So, console.log('ok') is no longer sufficient TAP output. You'd have to do console.log('ok\n1..1') at least.

A program that outputs

module.js:340
    throw err;
          ^
Error: Cannot find module 'lol'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at [eval]:1:1
    at Object.<anonymous> ([eval]-wrapper:6:22)
    at Module._compile (module.js:456:26)
    at evalScript (node.js:536:25)
    at startup (node.js:80:7)
    at node.js:906:3

and exits non-zero should be valid TAP output. This is a failing test and TAP parsers should have a way of dealing with this as a failing test.

We could simply state that a program that exits non-zero should be interpret as not ok\n1..1 and it's STDOUT should be ignored.

@isaacs

the rest of your suggestions seem fine on the surface.

anko commented

@Raynos

We could simply state that a program that exits non-zero should be interpret as not ok\n1..1 and it's STDOUT should be ignored.

Let's not get this text protocol confused with an operating system API.

  • If you pipe a TAP-producing program to a TAP consumer, the consumer won't know what error code the producer closed with. It might even still be running and just have fclose'd its output streams. (All the consumer sees is incoming text, which then stops.)
  • TAP can be produced from environments that have no notion of a process, such as from a browser.

@anko @isaacs

I think you guys are correct; I'm confusing operating concerns that are unique to programs spawning child processes and have nothing to do with valid TAP.

Determining whether tests pass is a combination of "did the program succeed" and "does the TAP output claim the tests passed".

@Raynos In the require('lol') case, you're specifying the behavior of a harness that runs tap-emitting test processes, not the behavior of a parser.

Node-tap, for example, treats ANY non-zero exit code as a failure, even if all the test points in the stdout passed. For example:

$ echo 'require("lol")' > lol.js

$ tap lol.js -Rtap
TAP version 13
    # Subtest: lol.js
module.js:334
    throw err;
          ^
Error: Cannot find module 'lol'
    at Function.Module._resolveFilename (module.js:332:15)
    at Function.Module._load (module.js:282:25)
    at Module.require (module.js:361:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/Users/isaacs/lol.js:1:63)
    at Module._compile (module.js:426:26)
    at Object.Module._extensions..js (module.js:444:10)
    at Module.load (module.js:351:32)
    at Function.Module._load (module.js:306:12)
    at Function.Module.runMain (module.js:467:10)
    1..0
not ok 1 - lol.js # time=350.902ms
  ---
  exitCode: 1
  file: lol.js
  command: /usr/local/bin/iojs
  arguments:
    - lol.js
  results:
    ok: false
    count: 0
    pass: 0
    plan:
      start: 1
      end: 0
      skipAll: true
  timeout: 30000
  ...

1..1
# failed 1 of 1 tests
# time=380.893ms

That's not to say we shouldn't specify these things, but just that they should be specified separately.

From a pure parsing pov, I think this is perfectly reasonable:

$ node lol.js | tap-parser
module.js:334
    throw err;
          ^
Error: Cannot find module 'lol'
    at Function.Module._resolveFilename (module.js:332:15)
    at Function.Module._load (module.js:282:25)
    at Module.require (module.js:361:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/Users/isaacs/lol.js:1:63)
    at Module._compile (module.js:426:26)
    at Object.Module._extensions..js (module.js:444:10)
    at Module.load (module.js:351:32)
    at Function.Module._load (module.js:306:12)
    at Function.Module.runMain (module.js:467:10)
[ [ 'complete',
    { ok: true, count: 0, pass: 0, plan: { start: 1, end: 0 } } ] ]

Note that the error dump is happening on stderr, so this is equivalent to:

$ echo "" | tap-parser
[ [ 'complete',
    { ok: true, count: 0, pass: 0, plan: { start: 1, end: 0 } } ] ]

In this case, a thing that has literally no data is treated as a skipped test. If it has a test, but does not have a plan, it's a failure because that is not valid TAP.

$ echo "ok" | tap-parser
[ [ 'assert', { ok: true, id: 1 } ],
  [ 'complete', { ok: false, count: 1, pass: 1 } ] ]

Non-TAP output is treated the same as nothing:

$ echo "this is not tap" | tap-parser
[ [ 'extra', 'this is not tap\n' ],
  [ 'complete',
    { ok: true, count: 0, pass: 0, plan: { start: 1, end: 0 } } ] ]

Comments are also not relevant:

$ echo "#comment" | tap-parser
[ [ 'comment', '#comment\n' ],
  [ 'complete',
    { ok: true, count: 0, pass: 0, plan: { start: 1, end: 0 } } ] ]

@Raynos So, I guess what I'm saying is, a stream containing 0 asserts and no plan should be treated like 1..0 (a skipped test set), not like ok\n1..1.

@isaacs

you're specifying the behavior of a harness that runs tap-emitting test processes, not the behavior of a parser.

If you consider node test-program.js | my-tap-harness it would not be able to determine whether the tests passed or failed because piping does not have a good way of dealing with STDERR and exit codes.

I think this is my confusion in my previous comment.

piping does not have a good way of dealing with STDERR and exit codes.

Yep, bingo.

But just to hopefully clarify my point: the confusion is relevant. Any spec we write should probably have some tips for test harnesses to treat a non-0 exit code as a failure even if the stdout indicates success. This wouldn't be part of the protocol specification, though.

I'm going to drag up this discussion because I'd definitely like to see a more detailed TAP spec.

3 .Add # time=ms as a comment to say how long a child test took to complete.

This seems, to me, like it would be best as diagnostic information on all tests. For example:

ok 1 - some test
  ---
  time: 350.902
  ...

Admittedly this would increase the length of standard TAP output, but it is diagnostic information, and the yaml blocks are for diagnostic information. This keeps the status directive space clear, and comments continue to be superfluous to TAP formatters.


This leads me on to something I think is quite important - if we're discussing specifying/codifying much more of TAPs spec: well defined diagnostic properties. TAP 13 alludes to diagnostic properties one day being well defined, I think that a proper well-defined iteration should include some well specified diagnostic properties.

Just to start with, I think time (ms the test took to run), actual (if not ok, the actual value of the failed assertion), expected (if not ok, the expected value of the failed assertion) and message (if not ok, the message of the assertion error) should all be well standardised.

I would also stretch to saying we could support diffs in the diagnostic output, via a diff property:

not ok 1 - some test
  ---
  time: 1
  actual: hello world
  expected: goodbye world
  diff: @@ 1,1, +1,1 @@
    -hello world
    +goodbye world
  ...

It would be up to the test runner to decide how to represent the actual/expected objects via diff - but having a TAP formatter try to output diffs via actual and expected properties would be less than ideal because of non-enumerable properties or referential equality.

Of course, the downsides to supporting diffs on both runners and formatters is that its another format to parse - but a very useful one nonetheless.

I think adding a diff format like that would be deeply problematic, because it isn't valid YAML.

> y.load('time: 1\nactual: hello world\nexpected: goodbye world\ndiff: @@ 1,1, +1,1 @@\n  -hello world\n  +goodbye world\n')
YAMLException: end of the stream or a document separator is expected at line 4, column 7:
    diff: @@ 1,1, +1,1 @@
          ^

Well, yaml is actually a pretty flexible format. So yes - while my specific example was invalid (because I'm not fluent in the format) this one is:

---
diff: |
    @@ 1,1, +1,1@@
    -hello world
    +goodbye world

As long as the indentation level is kept then any character is valid.

Well... not ANY character, but sure, what you're now proposing is that a diff be presented as a multi-line string (and this can be either | or > or double-quoted, with or without the - or +).

You can do that now, but I think given that "diff" is such a multi-faceted thing, it might be better to just let the fanciness live in the reporter? It's not at all clear how to diff objects, and some people prefer inline vs top-and-bottom vs side-by-side diffs, etc. Plus, colors.

You can do that now, but I think given that "diff" is such a multi-faceted thing, [...] some people prefer inline vs top-and-bottom vs side-by-side diffs, etc. Plus, colors.

Yup - absolutely. However by providing one universal format (a unified diff), formatters (i.e. things that consume tap and output fanciness like human readable spec reports, or HTML) can optimise for those cases where needed. Diffs can be parsed into <u><li class="del">... or just ANSI escape codes. If people want side by side, great, unified diffs support that. Hell, if people want a TAP GUI client, unified diffs are a great format to consume for that too!

it might be better to just let the fanciness live in the reporter

Not trying to split hairs here - but I want to clarify terms... so: If by reporter you mean formatter (i.e. the thing that parses TAP and beautifies it, e.g. a spec formatter), then yes. All the fanciness is left to the reporter. Outputting a unified diff format gives the formatter the facility to do so.

If we just provide actual and expected, and let the formatter calculate a diff, its not doing to end well - because we're limited to serialisation formats, which probably means JSON (or more YAML) - which means that expected and actual outputs are probably going to be limited by concepts like referential equality or non-enumerable properties, and so reporters will give us diffs like:

Assertion foo bar failed:
   - {}
   + {}

If we provide an actual diff format, which gives the test-runner (the thing that produces TAP) the ability to provide well defined reports on what exactly went wrong, for example the test runner could check all enumerable keys, see that they're all the same, then look for the non-enumerable ones - and opt to display them in the diff. Same for referential equality.

A little off topic, but I just want to make you guys aware of this in case it helps. I wrote a TAP13 spec compliant tap parser that's easier to understand and more modular. https://github.com/m59peacemaker/node-tap-parser Let me know if I can do anything with it to help your goals and the community.

Thanks @isaacs for all your hard work on TAP 🎉