tbela99/gzip

[Bug] Broken URL Attributes when Minify HTML is enabled

FaySmash opened this issue · 68 comments

When Minify HTML is turned off, the image is displayed correctly and the url is valid:

  • 1 default

  • 2
    default

When the Minify HTML option is turned on the url becomes invalid and the image can't be displayed:

  • 1 all / in the url path get replaced by ="" :
    minify HTML

  • 2 the leading / is missing:
    minify HTML

That issue might be fixed in the next release branch https://github.com/tbela99/gzip/tree/v.next

It is almost ready to be released. I was still testing it. Can you give it a shot?

Thanks

@tbela99 I installed the testing branch and can confirm, that my initial problem is resolved, the urls are now handled properly. (e.g.: //csn-alpha.hosting134897.a2e31.netcup.net/images/banners/Gamesroom.webp)

BUT

Everything else is broken now.

  • Current Version:
    before
    HTML of the Ticket Button:
<div id="jde-q2150115467813748jk" class="jdb-element jde-q2150115467813748jk jdb-element-default jdb-button jdb-button-primary jdb-animation" jdb-animation="type:none"><a class="jdb-button-link" href="/tickets" title="zu den Tickets">
   zu den Tickets</a></div>
  • Next RC:
    after
    HTML of the Ticket Button:
<div id="jde-q2150115467813748jk&quot;class=jdb-element" jde-q2150115467813748jk="" jdb-element-default="" jdb-button="" jdb-button-primary"="" jdb-animation="type:none" class="jdb-animation"><a class="jdb-button-link" href="/home/tickets" title="zu den Tickets"> zu den Tickets</a></div>

I will work on that.

Thank you

I am not able to reproduce that issue. maybe you have an unclosed quote in your content

I finally found that the Joomla print button has unescaped < and > that can cause this issue

title="Print article < Your Modules >"

This invalid HTML is in the Joomla print article button

Hi @tbela99 thanks for your work! Just tried out the newest 2.7.3 Build an I'm afraid to tell you that it's just as broken as before.

Here is the front page if my website without HTML minifying enabled:
without minifying.html.txt

And with HTML minifying enabled:
with minifying.html.txt

(I replaced my domain with example.com)

Can you disable the plugin and provide that html?

I usually use tools like the w3 validator to catch errors in HTML

Here is the HTML with gzip completely disabled:
gzip disabled.html.txt

And for the record: none of the HTML source code was typed by me, it's all automatically generated by astroid framework and jdbuilder from joomdev

The problem comes from your HTML page. It has too many unclosed quotes ("). Here is a sample

<script src="/misc/custom JS/set global variables.js"
"></script>
	<script src="/misc/custom JS/get display mode.js"
"></script>
	<script src="/misc/custom JS/set display mode.js"
"></script>
	<script src="/misc/custom JS/set Menu Toggle.js"
"></script>
	<script src="/misc/custom JS/hide header on scroll.js"
"></script>
	<script src="/misc/custom JS/anchor smooth scroll.js"
"></script>
	<script src="/misc/custom JS/JDBuilder/set Accordion Mode.js"
"></script>
	<script src="/misc/custom JS/JDBuilder/toggle Tab.js"
"></script>
	<script src="/misc/custom JS/astroid.js""></script>

You can validate you HTML markup using the w3 html validator https://validator.w3.org/

Hi, I finally got around fixing the unclosed quotes of my specified scripts and also removed all spaces in their URI, but the issue persists.
This is the output of the W3 Validator:

validator

Every error but the spaces in the images filenames are out of my hands because all those code is generated by astroid framework and JDBuilder.
What does gzips minify option actually do? Wouldn't it be enough to remove all line breaks and comments?

It produce a valid HTML5 document as specified here http://www.w3.org/TR/html5/syntax.html#syntax-tag-omission

the code can be seen here https://github.com/tbela99/gzip/blob/master/helpers/HTML.php#L81

Can you provide the HTML code without the plugin active?

Thank you

Interesting, didn't know you could omit those, til.

Here is the HTML with gzip disabled:
gzip disabled.html.txt

After having a look at the code, I think this line causes the breakage:

//remove quotes from HTML attributes that does not contain spaces; keep quotes around URLs!
'~([\r\n\t ])?([a-zA-Z0-9:]+)=(["\'])([^\s\3]+)\3([\r\n\t ])?~' => '$1$2=$4$5'

or at least it acts strangely at the jdb-animation attribute
regex

I don't see any problem with that. I quickly checked the page and there should be at least one space character between two attributes

id="jdc-fi00013605490449oz"class="jdb-column

I don't know if this can cause the issue you are talking about

it should be id="jdc-fi00013605490449oz" class="jdb-column

I tested the HTML you provide, the issue happens because of the problem I previously mentioned. I will fix it.

Thank you! I've also opened an issue about this: joomdev/JD-Builder#21

Maybe we wait what the folks at joomdev have to say or you rework your code for the sake of compatibility.

You might not be the only one having that problem. The plugin should not produce invalid output anyway

Can you verify that the issue is fixed on this branch https://github.com/tbela99/gzip/tree/v.next ?
Thanks

This worked wonders, the front page is working now! But sub pages are still broken :[
But this time it's because HTML minifying removes leading / in front of URIs in attributes.

<link href="/cache/z/ssl/cn/example.com/css/iLok7.css" rel="stylesheet">
becomes
<link href="cache/z/ssl/cn/example.com/css/iLok7.css" rel="stylesheet">

I know, this is intended, but it results in this request
minifying disabled

turning into this request

minifying enabled

(note the added /angebot in front of /cache which makes the URI invalid, resulting in 404)

That is your problem. The second link returns 404 error, which means that path cannot be resolved by the plugin. Links have a specific structure. modifying it manually can lead you to a 404 error.

link and meta without the leading / are perfectly valid. I will close this issue soon, since the problem is fixed

The problem comes from the minifying HTML option, no doubt. I haven't manually modified the link structure, gzip removed the leading /.

and NO, Links with a missing / are not valid, read here: https://webmasters.stackexchange.com/questions/56840/what-is-the-purpose-of-leading-slash-in-html-urls

with leading slash the image loads (200):
with slash

without it the URI is invalid (404):
without slash

I see what you mean. Links without leading / are perfectly valid but they are resolved using the current page url

so referring "image/something" form /page/news will give
/page/news/image/something

Can you open a different issue with the steps, configuration settings, .... to reproduce the problem? I will close this issue as it is already fixed. You are talking about a different issue

I see what you mean. Links without leading / are perfectly valid but they are resolved using the current page url

yes, and this is causing the problem and is unintended behavior, they should be resolved from the webroot as they are with HTML minifiying disabled.

Can you open a different issue with the steps, configuration settings, .... to reproduce the problem? I will close this issue as it is already fixed. You are talking about a different issue

please have a look at my opening Post which stated exactly the issue with the leading / missing in #2, so I wouldn't say "this issue [...] is already fixed". To reproduce the problem you could just visit a sub-site of you webroot, e.g. /page/news with an image in it which residents in image/something and enable HTML minifiying.

I can see thee image name contains space, can you remove the space in the name and try again?

I removed them but this doesn't change a thing, the problem is still the missing leading slash as mentioned here...

with slash

without slash

Because the image was just an example to show that the leading slash matters. When HTML minifying is enabled the image actually loads just fine, what doesn't load are all javascript and css files referenced in the <head> section, like the css file shown in here

Can you check this please:

You must have a file called .htaccess in the root of your web folder, otherwise files may not load when using the cache

You can rename the file called htaccess.txt to .htaccess

after that go to your joomla admin -> global configuration
Enable URL rewriting -> On

URL-Rewrite has been enabled on my joomla instance even before I used gzip. The htaccess.txt has also been renamed to .htaccess since then.

And that the files won't load has nothing to do with the cache but the URI being made invalid by removing the leading /. Couldn't you just please make a commit to the v.next branch which disables the part of the minifying Code which removes the leading / or URIs and I'll report back if sub-sites then work as intended?

I will check again but the minify code is not supposed to change anything in the attributes.
I must first be able to reproduce your problem, otherwise it is hard to narrow down the faulty code.

If I am right, you said your html and css files are in the <head> section, which means you may have the javascript and css settings set to OFF. Is that the case?

I will check again but the minify code is not supposed to change anything in the attributes.

Oh, I haven't found anything and though I was looking at the wrong spot, guess this code isn't the cause then.

If I am right, you said your html and css files are in the section, which means you may have the javascript and css settings set to OFF. Is that the case?

I admit I should have written "normally referenced in the section" because, in fact, the problematic links were referenced at the end of the <body> section
Because Process Javascript and Process CSS were indeed turned ON. So I checked what happend when I toggle Process Javascript/CSS...

If I set Process Javascript and Process CSS to OFF the leading / is present and the error disappears.

BUT

If I set Process Javascript and Process CSS to ON and minify HTML to OFF, the the leading / is also present and the error also disappears.

CONCLUSION:
The error only appears if both HTML minifying and Process Javascript/CSS are turned ON. If either Process/CSS or minifying HTML is turned OFF, the error disappears.

Voilà! :)

I will investigate this ASAP

It is really hard for me to reproduce your issue
I have minify, html, js, images and css features ON but I cannot reproduce your issue

image

Are your serving files from the top directory? Can you give more details about your configuration? PHP? Joomla?

Also my plugin runs after the Joomla SEF plugin

if possible you should also check the error_log file

In your screencap, the URI of a image inside the normal code body is shown. Note, that my problem doesn't affect images, but only referenced scripts/stylesheets (which would in the <head> section by default).

What do you mean by "top directory"? Website root of my site is just example.com/, not example.com/site/ if you mean this.
PHP is 7.3.15, joomla is 3.9.16, astroid framework is 2.3.0, JDBuilder is 1.5.0.

I also tried disabling all extensions but gzip, astroid and JDBuilder to no avail.

Maybe I could send you a DB dump + webroot backup of my site for further debugging?

And my error log file only contains logs of the failed web requests (ever /angebot should just be /)

Log Access 200 GET /angebot/maid-cafe Apache
Error 404 GET /angebot/media/jdbuilder/js/jquery-3.4.1.min.js Apache
Error 404 GET /angebot/media/system/js/mootools-core.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/css/astroid-596c9670e297083b43197ae168a111b4.css Apache
Error 404 GET /angebot/misc/CSS/outdatedbrowser.min.css Apache
Error 404 GET /angebot/media/jdbuilder/css/jdb-8b3fb93e3bf37c1d6cf0713865dbad8a.min.css Apache
Error 404 GET /angebot/media/astroid/assets/css/animate.min.css Apache
Error 404 GET /angebot/media/system/js/core.js Apache
Error 404 GET /angebot/templates/system/css/system.css Apache
Error 404 GET /angebot/misc/CSS/astroid.css Apache
Error 404 GET /angebot/templates/astroid_template_zero/css/style-450c2e70cb0e3f095476390c96488c2e.css Apache
Error 404 GET /angebot/media/system/js/mootools-more.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/bootstrap/popper.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/bootstrap/bootstrap.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/bootstrap/jquery.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.astroidmobilemenu.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.easing.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/script.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.jdmegamenu.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.offcanvas.js Apache
Error 404 GET /angebot/misc/JS/get_display_mode.js Apache
Error 404 GET /angebot/misc/JS/set_global_variables.js Apache
Error 404 GET /angebot/misc/JS/outdatedbrowser.min.js Apache
Error 404 GET /angebot/misc/JS/set_Menu_Toggle.js Apache
Error 404 GET /angebot/misc/JS/set_display_mode.js Apache
Error 404 GET /angebot/misc/JS/anchor_smooth_scroll.js Apache
Error 404 GET /angebot/misc/JS/hide_header_on_scroll.js Apache
Error 404 GET /angebot/misc/JS/JDBuilder/set_Accordion_Mode.js Apache
Error 404 GET /angebot/misc/JS/JDBuilder/toggle_Tab.js Apache
Error 404 GET /angebot/misc/JS/astroid.js Apache
Error 404 GET /angebot/media/jdbuilder/js/jdb.noconflict.js Apache
Error 404 GET /angebot/media/jdbuilder/js/jdb.min.js Apache
Access 200 GET /images/sponsors/Inucon.webp Apache
Access 200 GET /images/sponsors/koneko.webp Apache
Access 200 GET /images/sponsors/Cafe%20Into%20Wonderland.webp Apache
Access 200 GET /images/sponsors/Carlsen%20Manga.webp Apache
Access 200 GET /images/sponsors/Franco.webp Apache
Access 200 GET /images/sponsors/Xmas%20Con.webp Apache
Access 200 GET /images/sponsors/Tanzen%20bei%20Pelzer.webp Apache
Access 200 GET /images/sponsors/KAZE.webp Apache
Access 200 GET /images/sponsors/Ticon.webp Apache
Error 404 GET /angebot/templates/astroid_template_zero/css/astroid-596c9670e297083b43197ae168a111b4.css Apache
Error 404 GET /angebot/misc/CSS/outdatedbrowser.min.css Apache
Error 404 GET /angebot/media/system/js/mootools-core.js Apache
Error 404 GET /angebot/media/system/js/core.js Apache
Access 200 GET /images/sponsors/Inucon.webp nginx
Access 200 GET /images/sponsors/Cafe%20Into%20Wonderland.webp nginx
Access 200 GET /images/sponsors/koneko.webp nginx
Access 200 GET /images/sponsors/Carlsen%20Manga.webp nginx
Access 200 GET /images/sponsors/Franco.webp nginx
Access 200 GET /images/sponsors/Xmas%20Con.webp nginx
Access 200 GET /images/sponsors/Tanzen%20bei%20Pelzer.webp nginx
Access 200 GET /images/sponsors/KAZE.webp nginx
Access 200 GET /images/sponsors/Ticon.webp nginx
Error 404 GET /angebot/media/system/js/mootools-more.js Apache
Error 404 GET /angebot/media/jdbuilder/js/jquery-3.4.1.min.js Apache
Error 404 GET /angebot/media/jdbuilder/js/jdb.noconflict.js Apache
Error 404 GET /angebot/media/jdbuilder/js/jdb.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/bootstrap/jquery.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.easing.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/bootstrap/popper.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/bootstrap/bootstrap.min.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.astroidmobilemenu.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.jdmegamenu.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/vendor/jquery.offcanvas.js Apache
Error 404 GET /angebot/templates/astroid_template_zero/js/script.js Apache
Error 404 GET /angebot/misc/JS/outdatedbrowser.min.js Apache
Error 404 GET /angebot/misc/JS/set_global_variables.js Apache
Error 404 GET /angebot/misc/JS/get_display_mode.js Apache
Error 404 GET /angebot/misc/JS/set_display_mode.js Apache
Error 404 GET /angebot/misc/JS/set_Menu_Toggle.js Apache
Error 404 GET /angebot/misc/JS/hide_header_on_scroll.js Apache
Error 404 GET /angebot/misc/JS/anchor_smooth_scroll.js Apache
Error 404 GET /angebot/misc/JS/JDBuilder/set_Accordion_Mode.js Apache
Error 404 GET /angebot/misc/JS/JDBuilder/toggle_Tab.js Apache
Error 404 GET /angebot/misc/JS/astroid.js Apache

any updates on this matter?

Can you share those files with me please? you can build a docker image or shared a link where I can grab them

I got those files, I will investigate your issues
thanks

From want I can see there is a problem with whitespace in the file names like this
https://domain/images/mascots/Chi%20und%20Sana.webp

after verification I see absolutely no problem. I was using http and the browser was redirecting the links to https. after I used https I saw no 404

image

my guess is you might have turn "Upgrade Insecure Request" ON under the CSP tab. This will tell the browser to load urls only with https eventhough you use http

image

Can you check that please?

I also noticed that you had http cache setting set to OFF

image

my guess is you might have turn "Upgrade Insecure Request" ON under the CSP tab. This will tell the browser to load urls only with https eventhough you use http

Can you check that please?
All Settings are exactly the same as in my instance, because it's a direct export of it (I deleted every but two sites and removed unnecessary addons tho).
Upgrade-Insecure-Requests is set to Yes, never cause Problems tho because all my content is served locally via https.

I also noticed that you had http cache setting set to OFF
That's because the site is still WiP and I dind't want to run into caching related problems while debugging.

What sites have you tested? There should be two:

  • Home
  • home2

Home is served directly at root level (/) while home2 is served as subpage (/home2).

  • Home has no problems whatsoever
  • home2 has the described problems of missing /

Those Sites are exactly the same, the only difference is where they're published.

Weren't you able to access /home/home2 at all before you renamed htaccess.txt?

Have you checked if leading / were missing or present?

You need to have a .htaccess in your web root folder if you enable url rewrite, otherwise the home2 will not be available

I know and I have. home2 is avaliable

this site works

this site hast issues (look into the console output inside the developer console)

this is 1:1 the website I exported the sql dump and webroot zip from.

this is a 500 error. I suggest you check you error_log file for more details. I did not have those on my local install

image

I checked the file url. There is a /home/ in the path that causes the 500 error.

https://csn-alpha.hosting134897.a2e31.netcup.net/home/templates/astroid_template_zero/js/vendor/bootstrap/jquery.min.js

I am not sure this is related to the plugin but i will check ASAP

it isn't a copy, it's the same site, just published two times. Check the Menu Tab in joomla to verify this.

I checked the file url. There is a /home/ in the path that causes the 500 error.

that's what I'm trying to say since a month...

the /home is there because the leading / of the linked sources is missing. the problem disappears when I disabled gzip, so the plugin has to cause this.

The problem doesn't appear on "Home" because it's on the website root, it appears on "home2" because it's a sub-page.

The bug is fixed here d0ffb09,

Because the minify process strips quotes around tags and you were not using other settings (js|css), the joomla SEF plugin failed to rewrite those files path. I have fixed it.

Also, you should set dimensions of your images. They are very large. I see a 3780 pixels width image displayed as a 350px. When you set dimensions, the plugin can automatically resize them

the fix is in this branch https://github.com/tbela99/gzip/tree/v.next

Thank you for your patience :)`

Thanks for your hard work!

Because the minify process strips quotes around tags and you were not using other settings

(js|css), the joomla SEF plugin failed to rewrite those files path. I have fixed it.
sounds logical!

Also, you should set dimensions of your images.

I'll do it just before the site goes live.

the fix is in this branch https://github.com/tbela99/gzip/tree/v.next

fix confirmed working!
this resolves issue #2 stated in OP.


but the issue #1:

all / in the url path get replaced by ="" still remains :(

maybe it's just because of bad html and spaces in file paths, I could check this myself. I've inserted banners into the home and home2 pages on my demo site anyway if you want to have a look (the banners should be displayed on the blank section at the top, inspect it with the inspector to see the broken image path).

That issue has been fixed

image

That issue has been fixed

Yeah, it is and it's working for me too. But this fixes only my 2nd initial issue (missing leading /) but not my 1st initial issue (/ getting replaced by ""= ) which happens to the header image (Search the page HTML for "Gamesroom" to find the broken url)
Both issues are probably unrelated and only one of them is fixed now.

the hilighted section sould be an image

grafik

I will check that, but remember that fix was applied only when minify setting was set to ON

You remember we fixed that issue. The fix only work when you enable minification. Because it adds an extra step, it may be more interesting to implement it as a setting that can be enabled on its own. What do you think about it?

that is a different issue. I will take a look at it

can you provide another dump? I am not able to reproduce that issue on my local copy.
Thanks

Like I said. the bug is not in the plugin itself but probably in astroid framework. When I check its code, i can see many references to background-image

grep -r "background-image" webroot/libraries/ | less

a fix could be preserve quotes around style. I will explore that option

fixed! finally! :)
The fix is in this branch https://github.com/tbela99/gzip/tree/v.next

Like I said. the bug is not in the plugin itself but probably in astroid framework.

Not unlikely, the framework is still young and isn't free of bugs.

fixed! finally! :)

🎉
This issue is now Closed for good, after 2 months of hard work :)

Please keep my Website dump a little longer tho, because I already discovered two other bugs unrelated to HTML minifiyng, I'll open new issues for them.