Handling circular references
nonara opened this issue · 16 comments
Issue
In a circular reference scenario, it goes into an infinite recursion, which blows the stack.
Reproduction
a = { self: null };
a.self = a;
common_1.deepMerge.all([ a, { b: 2 } ])
Fix Suggestion
When recursively walking objects, maintain a chain Map<key: original, value: cloneOrOriginal (depends on settings)> of references to objects & arrays.
Walk function logic:
- Create a new chain map from the old (We want to be sure that only the direct lineage exists in this map, which is why create a unique copy for each walked node)
- Check if the current object exists in the map.
- If it does, flag as circular reference
- Otherwise, add the current reference to the map
- Handle circular reference
- If there is nothing to merge, set output value to the value from our map
- If there is something to merge and no custom logic exists, throw a circular reference error
- If not circular reference
- Perform normal merge / clone logic & set output value
- Set output value as value for the current reference in our chain map
- Recurse for properties / elements, passing our map
This would probably best be solved by a custom isMergeableObject
that keeps track of what it's compared to, and returns false
if it's an object that has been seen before.
I would be open to linking to such an implementation, or perhaps even putting it into the readme for people to copy/paste.
Updated my original comment with a strategy.
Because of the other issue I posted, this isn't possible for me at the moment. I would suggest directly mitigating this issue in this case. Circular reference issues are bound to happen in many scenarios. This is an issue that we have to deal with in compilers & parsers very often.
In this case, anything with a reference to a node higher up the chain is going to blow the stack. Generally speaking, it's a good idea to ensure that logic is built into recursive functions to ensure that they can never infinitely recurse.
That in mind, proposed strategy above is an approach which covers the issue or throws an appropriate error message if it can't.
If you'd like, I'd be glad to submit a PR.
This would probably best be solved by a custom
isMergeableObject
that keeps track of what it's compared to, and returnsfalse
if it's an object that has been seen before.
Using seen would cause false positives in some cases.
Example:
- Root
- Child1
- Grandchild 1
- Child2
- Grandchild 2
- Child1
If we're walking Child2, Child1 and Grandchild1 would have been 'seen', but because they're not in the ancestral lineage, they can be merged.
That's why we need a direct lineage map to be passed to the recursive function. Without a third parameter, an isMergeableObject
implementation couldn't do this.
But again, this isn't too difficult to solve and could be taken care of with a few extra lines. Let me know if you'd consider a PR.
actually, on reflection, this is a much harder problem that I don't really want to try to solve here – merging two trees makes sense, but merging two cyclical graphs doesn't make sense – what would you even expect the output to be if you tried to merge
const a = { child: { grandchild: { tiny_baby: 'hi } } }
const b = { child: { } }
b.child.grandchild = b
merge(a, b)
That seems nondeterministic, right?
I'm inclined to say "the solution is to eliminate cycles in your objects before trying to merge them"
Edit: I misread the original code... updating my response.
Effectively, that becomes:
const a = { child: { grandchild: { tiny_baby: 'hi' } };
let b;
b = { child: { grandchild: b } };
You're correct that in this case this cannot be merged. Effectively here, if we were to assign a pseudo-AST to it, we have a Node
({ tiny_baby
: 'hi' }) and a ParentReferenceNode
(b
). This would be a non-mergeable scenario. In my proposed method above, I suggest allowing a custom handler function for dealing with circular merge situations, and if one doesn't exist, we use default behaviour.
- Handle circular reference
- If there is nothing to merge, set output value to the value from our map
- If there is something to merge and no custom logic exists, throw a circular reference error
In your example, it would either pass off the conflict to a handler function, or if none is given, it would throw a circular reference error.
As far as whether it should be solved I have two thoughts:
1. It's generally a good idea to ensure that recursive functions cannot ever hit an infinite recursion scenario.
That in mind, putting in some logic to to track this is good practice. And if you're going to do that, the extra bits to allow handling those scenarios isn't too much extra and would help in making it so only those scenarios that should throw an exception do. (ie, there's a reference to a parent but nothing to merge with it, we can just keep the reference as a reference to its merged counterpart)
2. How often will two graphs be merged?
It depends on the context. This is very common in some areas of CS, and not as much in others. (Any object which contains a reference to a parent will become un-mergeable, and this is fairly common.) Ultimately, it's your call. Not supporting it will limit the scopes in which the library can be used, where adding the ability to handle circular references will certainly broaden that and help a lot of people.
At the very least, it will reduce bug reports when people find themselves hitting circular ref issues.
Again, the rules are pretty simple (pseudo-AST again)
-
If one property is a
ParentReferenceNode
, and there is no node to merge- Call a handler function if set to determine output
- (Default if no handler) Output a reference to the corresponding merged parent node
-
If one property is a
ParentReferenceNode
and there is another node to merge- Call a handler function if set to determine output
- (Default if no handler) Throw a circular reference exception
The default behaviour can be changed, but this prevents stack being blown and allows users to have control over circular issues, which will allow the library to be used in more contexts.
It's generally a good idea to ensure that recursive functions cannot ever hit an infinite recursion scenario
I'm amenable to this. If we added any code to address this, my preferred solution would be to throw an error with a clear message as soon as a cycle was detected.
Merging data structures in JS is complicated enough already, I don't want to make my life the hell that it would become by trying to solve general CS "merging cyclical graphs" problems.
Makes sense. How about this:
- Track circular refs
- If
customMerge
is set > Pass a third parameter to thecustomMerge
function calledcircularReferences
. - If
customMerge
is not set or it returns undefined > The default behaviour throws a circular ref exception, per your preference
circularReferences
would be either be undefined
if neither a or b are circular, or it will be a simple map if either is.
The map would be <sourceReference, targetReference>
(see below for example)
Simple, non-breaking, and still allows us hell-dwellers to do our thing, as needed. 😄
Example
The following is how I could implement the behaviour I proposed earlier using this extra param:
let a, b;
a = { child: { parent: a.child } }
b = { child: { hi: 1 } }
merge(a, b, {
customMerge: () => (a, b, circularReferences) => {
if (circularReferences && (a === undefined || b === undefined))
return circularReferences.get(a ?? b); // The value will be a reference to mergedObj.child
}
});
If you're down with that, let me know, and I'll do the PR.
deepmerge should not blow up when merging with circular references. I'd love to see this getting implemented.
deepmerge should not blow up when merging with circular references. I'd love to see this getting implemented.
Please read the thread above – I'm open to throwing a clear error when a cycle is detected, but I do not see an obvious way to return sensical data when merging a cycle.
I think the default behavior should be throwing an error. If custom merge function is provided then that function can determine how to handle cycles.
I think the default behavior should be throwing an error. If custom merge function is provided then that function can determine how to handle cycles.
Agreed. My proposal above should meet those specifications. I'd still willing to do the PR.
@TehShrike I agree that handling circular references is not an easy task, but I think a good starting point is to inspect solutions on stringifying circular references. I think a similar approach could handle circular references here, so it would allow to deep merge objects with circular references without throwing an error.
This implementation would give this package a superior status when it comes to deepmerge-solutions.
I think the default behavior should be throwing an error. If custom merge function is provided then that function can determine how to handle cycles.
I think this is a first good logical step, though.
(Note that I don't want to say that you must implement a proper circular reference handling, I'm just saying it would be super nice to have out of the box)
Unless I'm misunderstanding you, stringifying would not work as it would still have a circular issue.
This sort of thing is not as complicated as it may seem. It's quite common in AST-like scenarios. I apologize if I've not explained the proposed solution clearly.
To clarify, you simply maintain a linear set of parent references during the walk, then pass that set to the custom merge function, if present. The custom merge function can determine what to do if a
or b
are in the passed set. If no custom merge function is present, or the function returns a value that is still a circular reference, it will throw a circular error.
This allows for fast-exit throw behaviour as the default while also providing for custom handling if the user desires to.
Unless I'm misunderstanding you, stringifying would not work as it would still have a circular issue.
@nonara Unfortunately, you misunderstood me. I know that stringify is not a solution to this problem, I was just saying that it has the same problem and it can be worked around. I was just making a point that it's not impossible to fix this, properly.
This sort of thing is not as complicated as it may seem. It's quite common in AST-like scenarios. I apologize if I've not explained the proposed solution clearly.
To clarify, you simply maintain a linear set of parent references during the walk, then pass that set to the custom merge function, if present. The custom merge function can determine what to do if
a
orb
are in the passed set. If no custom merge function is present, or the function returns a value that is still a circular reference, it will throw a circular error.This allows for fast-exit throw behaviour as the default while also providing for custom handling if the user desires to.
I get your point, I just wish that custom function could be part of the lib in the first place. So I seek a generic approach so to speak.
I'm aware of the main concern now, it seems to be that you might try to merge an object that has nested objects that are part of deeper circular references on the opposite object you want to merge with. Simply skipping as soon as we see circular references would skip the deep merge and the result would be weak.
But what about skipping circular references when we come across them and do the deep merge in both directions, merging a in b and then b in a? I think this would successfully merge and handle circular references well.
Little sample:
const a = {
a1 = 1,
a2 = 2,
a3 = 3
};
a.a = a;
const b = {
a1 = 1,
b1 = 1,
c1 = 1,
a = {
a3 = 4
a5 = 5
}
}
Of course we have conflicting values here. If I remember correctly the first object will be in favor of conflicts, it should not? if we favor a
it should look like this:
{
"a1": 1,
"a2": 2,
"a3": 3,
"a5": 5,
"a1": 1,
"b1": 1,
"c1": 1,
"a": {
"a1": 1,
"a2": 2,
"a3": 3,
"a5": 5,
}
(here a.a
should be a circular reference to the root object)
But if we favor b
the resulting object should look like this:
{
"a1": 1,
"a2": 2,
"a3": 4,
"a5": 5,
"a1": 1,
"b1": 1,
"c1": 1,
"a": {
"a1": 1,
"a2": 2,
"a3": 4,
"a5": 5,
}
(here a.a
could be circular or not. If we strictly favor b
it could be loosed, I'm not sure about that.)