Allow html diffs to be interrupted / cancelled
GoogleCodeExporter opened this issue · 3 comments
GoogleCodeExporter commented
DaisyDiff's html diff can be extremely resource hungry. Eg our wiki recently
encountered a page diff (admittedly a very large one) that caused the system to
run out of memory. Unit testing the difference at fault I gave it 900MB and it
still OOMEd.
There are numerous ways that the resource use could be limited. Eg modifying
the LCSSettings' limits, or implementing skipRangeComparison. But I'd like to
just make it cancellable externally (because it's easier).
The Eclipse RangeDifferencer at the heart of the HTMLDiffer can be passed an
IProgressMonitor, whose isCancelled method is often checked. When the
IProgressMonitor is cancelled, that diffing operation terminates.
I'm working on a simple change to DaisyDiff to include a diff(IProgressMonitor
progressMonitor, TextNodeComparator leftComparator, TextNodeComparator
rightComparator) method that passes that progressMonitor through to the various
calls to RangeDifference.
That's the most straightforward way to implement the functionality I need, but
it's not necessarily the best. Eg people might think we shouldn't leak the
reliance on Eclipse classes to client code, which would be pretty reasonable.
Also I haven't done anything about the rest of the IProgressMonitor interface
at the moment, so it's a bit misleading to take one.
Original issue reported on code.google.com by don.jp.w...@gmail.com
on 17 May 2011 at 12:04
GoogleCodeExporter commented
Perhaps it would be better to find out if there are any memory leaks and fix
them first?
I am not sure if DaisyDiff was designed in order to run with gigantic files.
See also issue 21 and issue 23
Original comment by kkape...@gmail.com
on 19 May 2011 at 8:36
- Added labels: ****
- Removed labels: ****
GoogleCodeExporter commented
There's no memory leaked beyond the end of the operation. It's just that as
daisydiff/rangedifferencer go through their machinations they slowly create a
giant set of result definitions. This may be the result of a bug that occurs
for my specific data, but I don't really think so. Doubtless there are
improvements to be made to daisy diff's memory usage. But it may never be
achievable to make it handle absolutely any data in limited time and memory.
However, as long as a running diff can be cancelled externally, mitigation for
impossible diffs is feasible.
I don't consider this issue to be a defect report but an enhancement request
(but I can't set that).
Think of it as a solution to
http://code.google.com/p/daisydiff/issues/detail?id=21#c5 that is based on the
observation that daisydiff takes considerable time to consume memory.
Original comment by don.jp.w...@gmail.com
on 20 May 2011 at 12:29
- Added labels: ****
- Removed labels: ****
GoogleCodeExporter commented
Original comment by kkape...@gmail.com
on 20 May 2011 at 7:33
- Added labels: Type-Enhancement
- Removed labels: Type-Defect