nuxodin/diff_match_patch-php

Empty diff on production server

tangor86 opened this issue · 3 comments

Hello!

On my local it works fine, but on production server it is weird behaviour in some cases.

For example.

Source text:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Destination text:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
new line here...

Diffs array after $this->diff->diff_cleanupSemantic($this->diffs) - LOCAL:
[0]== equal ==
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
[1]++ add ++

new line here..."

-- correct one!

Diffs array after $this->diff->diff_cleanupSemantic($this->diffs) - PRODUCTION:
[0]== equal ==
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
[1]-- del --
""
[2]++ add ++

new line here..."

-- WRONG one.
Because somehow empty DELETE change appears... (1 index).

What can be the reason of differ diff-arrays?
The servers is setuped quite the same, UTF-8 is supported, PHP 5.4+, Apache, Debian 6.

Ok, I found out that is because:
mbstring.func_overload = 7 in prod,
and mbstring.func_overload = 0 in local.

Tests (diff_match_patch_test.php) running when 0 in local:
Tests passed: 148
Tests failed: 9
Total time: 0.17511820793152 ms

Tests when mbstring.func_overload = 7 - does NOT finished at all, hangs up and script finished by timeout...
The same for production...

Making some more researches I came for some clear conclusions:

Having mbstring.func_overload = 7 (or other value that overloads standart php string functions) in "php.ini" will lead to some problems in some cases:

  1. test-cases (diff_match_patch_test.php) are not passed in a limit of script's max_execution_time 30 sec
    (with mbstring.func_overload = 0 they are passed on the same machine just for 0.17... ms)

  2. the test example above will generate warnings on the top of the page:
    WARNING: mb_strpos(): Empty delimiter in /var/.../diff_match_patch/diff_match_patch.php on line 147

line 147: $i = mb_strpos($longtext, $shorttext);

I found out, that that warning is because of $shorttext is false, so the "hotfix" I applied:
line 147: $i = ($shorttext !== false) ? mb_strpos($longtext, $shorttext) : false;

  • solved the warning, but is a bad decision in my opinion.

If the previous part of code evaluates as expected, should $shorttext be ever false?

  1. Last one: the example above gives wrong diff array, like it was mentioned - empty changes:
    [1]-- del --
    ""

I suppose, it is an easy-fix for all these cases, which are connected with each other.

Hope my researches will help to fix that...

BW,
Eugene


also, my class vars are a little changed to achieve a better results in comparing...
public $Diff_Timeout = 2.0; // was 1.0
public $Diff_DualThreshold = 320; // was 32

Thank you for that!