ampproject/amp-toolbox-php

Varnish <esi> tags are being stripped by DOMDocument

Closed this issue · 4 comments

Hi,

We are using varnish as a reverse proxy for ESI blocks (even on AMP pages) and this library alters <esi:include> tags, so varnish is not able to recognize them.

we would like to be able to configure amp-toolbox-php to tolerate the presence of esi:* tags
cf https://docs.oracle.com/cd/B14099_19/caching.1012/b14046/esi.htm

Yes, this is a known problem with DOMDocument, as the <esi> tags can end up in place that make the document invalid HTML5.

We already have code in place for other constructs that DOMDocument's parsing chokes on, so we can add <esi> tags as another of these edge cases to support.

Then later in validation we'd need a way to ignore esi:include tags. This would need to be opt-in.

@schlesseracould you hint me on these "others constructs", so I might contribute a solution.

@killerwolf These happen in AmpProject\Dom\Document.

We have some transformations that we do before the markup is turned into a DOM here: https://github.com/ampproject/amp-toolbox-php/blob/0.5.2/src/Dom/Document.php#L506-L511

Then we revert some of these transformations again when we go back from DOM to markup: https://github.com/ampproject/amp-toolbox-php/blob/0.5.2/src/Dom/Document.php#L608-L613

Here's a rough example of how the fix should be done: contao/core-bundle#1360 (comment)

However, I'd refrain from using a random ID (which can theoretically lead to conflicts) and instead using and incrementing index instead.