caxy/php-htmldiff

HTMLPurifier not writable

AroundtheGlobe opened this issue · 15 comments

I've updated php-htmldiff from version 1.0.0 to 1.0.9 because it wasn't able to compare a 5.500 word article within 30 sec before the PHP timeout kicked in. After updating I got the error message:
/vendor/ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer not writable, please chmod to 777

I've added the config settings to a dir with 777 rights like this:

$htmlDiffConfig = new HtmlDiffConfig();
$htmlDiffConfig->setPurifierCacheLocation(Config::getDocRoot().'tempdir');
$htmlDiff = new HtmlDiff('text1 ', 'text2', $htmlDiffConfig);

But that setting doesn't seem to be passed on by HtmlDiff so there is not way to set the new (temp) directory. Version 1.0.9 also struggled with the large text, so I think downgrading is the only option I have?

After setting cache location, do you see the path you set in the error message?

The message above was the only message I've got. I had composer download and update all files today and it seems the problem is solved now.

The 5.500 word HTML diff still takes a long time (longer than a normal user would wait). When I wait for 400 sec (or 6.5 minutes) I am now able to to a diff without any errors.

I am not sure why I takes 7 minutes, one of the things that I know can take up loads of time is inline images (base64 encoded src tags). Also I advice you to make sure you run at least version 7 of PHP. I might be able to tell you why your html diff is slow if you can provide the data-set.

This is the text I try to compare (with an older / different version) https://www.aroundtheglobe.nl/reizen/duitsland/berlijn-si7161.html
It's without the encoded src tags and I am using php 7.1.x on the server so that should be okay. Maybe something else is slowing it down in my (html) code?

Just to be clear, are you literally compairing the source of the entire page html (including all the stuff like menu's etc?) or just the content of <article class="post-content"> for example?

Only the content of <article class="post-content">

I have created some testing fixtures based on that page

fixtures.zip

The performance on my local machine was not to bad, so not sure what the issue is.

Screenshot from 2019-04-16 23-00-03

Not sure what your server specs are, I run the test on a "Core i7-4790K CPU @ 4.60GHz"

If you want to test this yourself, checkout the vendor library, install using composer, overwrite the html pages from the zip in the directory /tests/fixtures/Performance and run vendor/phpunit/phpunit/phpunit --group=performance from the root of the library folder.

@jschroed91 I think we can close this issue, since the original issue seems invalid, and the performance issue is not really applicable to this specific ticket.

I will try to have a look today to test and see how it performs on the main server (vs dev server). The dev server is a Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz

I will try to have a look today to test and see how it performs on the main server (vs dev server). The dev server is a Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz

According to user bench, the 950 has around 35% of the performance per core of a 4790, plus there is clock difference, so what would we expect on the 950, around 60 seconds maybe?, ignoring memory speed differences. Not really scientific, but not sure if 400 seconds makes sense, don't think so.

After uploading everything to the main server I got the same error again.
/var/www/[mysite]/vendor/ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer not writable, please chmod to 777

the htmldiff code is still set the same as above.

To answer this question:

After setting cache location, do you see the path you set in the error message?

No, I don't see the path I've set in the error message, but tI assume the default path of HTMLPurifier: /ezyang/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer

I managed to solve the problem, but I can't explain what caused it. I use composer on my pc to get the latest files and update the composer pachages. When I compared the version of the vendor files / caxy files on my development server which at one point started to work with the live it seemed a few directories and files where missing. Uploading the files/directories solved the problem.

The missing files in my vendor dir where:
/vendor_path/caxy/lib/Caxy/HtmlDiff/ListDiffNew.php

In the ezyang package some directories where missing:
/vendor_path/ezyang/htmlPurifier/library/DefinitionCache/Serializer In this directory the two directories HTML and URI where missing.

Uploading the files / directories solved the caching problem and comparing the HTML text above takes about 15 seconds on the live server which is acceptable.

Thank you @SavageTiger for all your support!

@AroundtheGlobe Sorry for the late response, I was a bit busy. Happy that the issue is solved though, I think 15 seconds as what is about to be expected for a text of that size.

cc @jschroed91

Thanks for your help here @SavageTiger !