tapjs/tcompare

v6 changes

Closed this issue · 1 comments

For tap v18, the following changes are planned.

Planned changes:

  1. rewrite in typescript, export a hybrid module (as part of tap v18's comprehensive ts/esm support, in tapjs/tapjs#821 )
  2. refactor considerably so that instead of building up one diff string along the way, only to rip it back into two strings and re-combine with jsdiff, just build up 2 strings from the get-go, and diff them with jsdiff.

Under consideration, somewhat tentative: object diffs only show the object path to differing fields. This will vastly reduce the size of the strings being handled, because we only need to print it at all when the printed object and expect values differ.

This means if you have an object like:

obj = {
  long: 'list',
  of: {
    many: 'properties',
   '//': '... so many more here ...'
   x: {
    y: {
      z: 1,
      a: 2,
    }
  }
}

and you compare it with:

expect = {
  long: 'list',
  of: {
    many: 'properties',
   '//': '... so many more here ...'
   x: {
    y: {
      z: 1,
      a: 3,
    }
  }
}

then you'll get a diff like:

--- expect
+++ actual
 Object {
   "of": Object {
     "x": Object {
       "y": Object {
-        "a": 3
+        "a": 2
       }
     }
   }
 }

In the case of Sets (and, arguably, Arrays), omitting the other entries in the object kind of obliterates context, so in those cases, we'll just print both objects naively, and let jsdiff sort it out. If it turns out this isn't actually helpful (or is something you might want to switch off sometimes), then it's pretty simple to have a switch to disable it and show the more naive diffs always.

The reasoning is that the current version of this module spend a lot of time in string munging, mostly due to throwing large strings at jsdiff. Dumping an object with format() is reasonably fast, but the added string processing for large diffs hits an inflection point for large objects (as seen in tapjs/tapjs#699).

More often than not, you don't care about most of that data, it's literally just being thrown away. The important part is just identifying the field that's different, so you can see clearly where it lives on the object, and many tests fail in this way for a difference that's one or two lines in the diff. (Unless you're doing t.same(largeObject, otherLargeObject) and one of them is null or something, but there's not much I can do about that.)

In playing around with an initial proof of concept, it does sometimes feel a bit lacking to only see the field that differs - it might be good to somehow also include the siblings of that field, maybe abbreviated in some way (ie, include the key and type or simple value, but don't descend into children? This gets considerably more complicated, though, because the child nodes need to know if any of their siblings got unmatched, and they're currently all walked over sequentially and thrown away.)

On the plus side, you almost always get an indication of the path to the differing field, and hardly ever lose track of the context of the key path, which is much more important context than sibling fields. And the strings being passed to jsdiff (or even just concatenated and managed in this module) end up being much smaller. (In some of the pathological cases I had tested, it's on the order of a "tens of bytes vs hundreds of kilobytes" difference.)

Yeah, been playing with this on the v6 branch, and the diffs are sooooo much nicer. It's just hardly any noise, and you can almost always see exactly where the problem is, much more easily. It's not as technically "correct" from an "object diff" point of view, but the object path context is 1000x more useful than the surrounding text context of the fields that actually matched anyway.