Potential stack overflow with ConcatenatedValueSource
Opened this issue ยท 4 comments
While working on a specific token, I triggered a stack overflow due to a deep recursion with a concatenated value source. The problem is the following piece from its implementation:
private Trampoline<byte[]> getData(final ImmutableList<Value> values, final BigInteger currentOffset, final BigInteger currentDest, final BigInteger offset, final BigInteger length, final byte[] output) {
if (length.compareTo(ZERO) <= 0) {
return complete(() -> output);
}
if (currentOffset.add(values.head.slice.length).compareTo(offset) <= 0) {
return intermediate(() -> getData(values.tail, currentOffset.add(values.head.slice.length), currentDest, offset, length, output));
}
final BigInteger localOffset = offset.subtract(currentOffset).compareTo(ZERO) < 0 ? ZERO : offset.subtract(currentOffset);
final BigInteger toCopy = length.compareTo(values.head.slice.length.subtract(localOffset)) > 0 ? values.head.slice.length.subtract(localOffset) : length;
System.arraycopy(values.head.slice.getData(), localOffset.intValueExact(), output, currentDest.intValueExact(), toCopy.intValueExact());
return intermediate(() -> getData(values.tail, currentOffset.add(values.head.slice.length), currentDest.add(toCopy), offset, length.subtract(toCopy), output));
}
In the second to last line, a call to getData
is done. This may very well be another concatenated value source. The problem then is that this function is not tail recursive. I made a test for proof, together with a quick fix so I could use it locally in my branch https://github.com/ccreeten/metal/tree/concatenated-overflow
Now, I am thinking this problem may appear in other places as well. For example, the Seq
implementation parses the head, and recursively calls the iterate for the tail. However, if I am not mistaken, we can again trigger an overflow by having a big recursive definition with sequences in the head side.
Of course, this is probably never going to become a problem. The concatenated value source happened with a real world example though. It can probably be solved by creating a different token implementation however.
Interesting find! I haven't looked at this in-depth yet, but I would think that the values.head.slice.getData()
call should just be wrapped in a trampoline?
With regards to implementations such as Seq
, I think it was a conscious decision that we don't expect definitions to be so big that they will cause a stack overflow. At the same time, if we start generating/deriving token definitions, anything is possible...
I think it would not solve the problem, because it is not tail recursive. The way I see it, with a tail recursive trampoline implementation, a call to a trampoline pushes a new stack frame, and a return of a trampoline then pops it. When you call another trampoline inside this function, you are already pushing a next frame, and it being a recursive function, will make this happen again and again. At least, that is what I think ๐
With tokens, it seems like a reasonable trade-off. The most likely way I would see it happen with a manually created token would be with a TokenRef
I would think.
I will debug your example to see if I can come up with a good solution! But I agree the non-tail-recursiveness is a deal-breaker for just wrapping in a trampoline.
Note: if you get a java heap space error when using fold(..., Shorthand::cat)
, then use cat(...)
instead, that uses FoldCat
which is an optimized version and will create a ConcatenatedValueSource
with a single long list.