AsyncBanana/microdiff

Circular references

majg0 opened this issue ยท 9 comments

majg0 commented

I don't think this should be supported, because of the extra overhead either in performance or in API complexity, but I think the documentation should state clearly that it's not handled. I mean, reading the code this is obvious, but it would still be a good service :)

P.s. do you align with the handmade manifesto? If you haven't heard of it, do check it out, because it seems you do! ๐Ÿ˜Š๐Ÿ™Œ

Cheers

Don't be silly, the overhead in both complexity, memory use, and code size is negligible (see for yourself 3b6ab8f), and it serves the important purpose of not creating situations where using this library would potentially crash the process or exhaust your server's memory. For such a small addition (less than 2 lines of code) that's a huge win.

majg0 commented

That totally depends on the size of the objects. If the user knows that there are no circular references, why have the runtime check it at all?

For big JSON blobs with many objects, this will severely impact performance by having to

  1. reallocate the array
  2. linearly search for references for every addition

Sure, it is assumed that it is small, but performance almost doesn't matter in the small anyway, so why micro-optimize then?

Benchmark it.

majg0 commented

I don't have the incentive to put that amount of time in, I don't even use the lib. Just wanted to put this issue out there.

I will try to benchmark this later. If the results show that cyclical reference checking slows nested objects down enough, I will add an option to disable the checking.

For most objects with low depth, as expected, there is little to no difference. However, with highly nested objects, I noticed a difference of around 30%. Should I add an option to disable it? Having the choice would increase code size slightly and reduce performance slightly, but it would overall be faster if you disable checking, and you can still keep checking if you want.

majg0 commented

Great work! ๐ŸŽ‰๐Ÿ™Œ

One thing we could try before using a flag is to use a Set instead of an Array. How does that affect performance?

I think a 3% pessimization or so is where I'd feel alright about it personally, because if one only diffs large JSON blobs which just arrived serialized over the network, there is no need to check for circular dependencies, so it would feel silly to miss out on something like every 20th diff (in the case of 5% e.g.)

I decided to add an option in v1.2.0, so by default, cycles are checked, but you can disable it as specified in the README and release notes.

majg0 commented

Nicely done!