Possible diff algorithm improvement
maliayas opened this issue · 3 comments
@maliayas Thanks for opening this issue!
This is actually an undesired side-effect of one of the newer features we implemented - isolated tag diffing. I had not actually thought about this until now, so I'm very glad you opened this issue.
Basically, the isolated tag diffing is comparing the italic tags emergency escape and rescue openings
separately from the rest of the content - this is to fix a lot of the issues we had with the diff output not protecting the HTML structure.
The issue here is that in order to diff them in isolated, we actually replace the entire tag with a placeholder "word" before we diff the content. The diffing algorithm is not aware of the length of the string that the placeholder represents, and therefore sees it as 1 word, and in this case it is finding a longer match in shall be
than it is in the placeholder match.
So, it will take a little of work, but is certainly possible. This will be one of the higher priorities to tackle.
I see. Great explanation. If fixing this, will break other stuff, don't worry about this issue. I understand that perfecting a diff library may be quite complex.
Btw. demo tool is awesome.
Looping back around here - our highest priority of this library was the accuracy of the diff, so unfortunately performance took a back seat to it. However, we do like to leave that decision up to the end users when we can - the config option setIsolatedDiffTags
is used to define which tags are diffed in isolation, and currently the defaults include i
and em
tags.
I'll see if I can update the documentation to highlight the reasoning behind choosing this as the default option this weekend.
Closing this issue.