penibelst/jekyll-compress-html

Feature: Allow to specify more than one type of comments

SanmayJoshi opened this issue · 4 comments

Summary

Currently, user can specify one type of comment, so that commented content is removed in output. Many times users have CSS or JS code in html file, and such CSS or JS code also has comments, removing which would save some memory space.

Current behavior

Only one type of comment can be specified in _config.yml file.

Proposed behavior

More than one type of comments can be specified in _config.yml file.

Possible solution

Allow users to specify multiple arrays of size = 2 each.
For example:
In _config.yml

compress_html:
  comments:
    - ["<!-- ", " -->"]
    - ["/* ", " */"]

Notes

May take a toll on execution time if user specifies more types of comments. But, it would be up to user whether to specify multiple types of comments or not, thus user would have a choice in trade-off.

It can possibly be implemented as follows:

In _config.yml:

compress_html:
  comments:
    - ["<!-- , -->"]
    - ["/* , */"]

In compress.liquid, replacing code [1] with code [2] :

code [1]

{% comment %}Remove comments{% endcomment %}

{% if site.compress_html.comments == "all" %}
  {% assign _comments = "<!-- -->" | split: " " %}
{% else %}
  {% assign _comments = site.compress_html.comments %}
{% endif %}
{% if _comments.size == 2 %}
  {% capture _comment_befores %}.{{ _content }}{% endcapture %}
  {% assign _comment_befores = _comment_befores | split: _comments.first %}
  {% for _comment_before in _comment_befores %}
    {% if forloop.first %}
      {% continue %}
    {% endif %}
    {% capture _comment_outside %}
      {% if _carry %}{{ _comments.first }}{% endif %}
      {{ _comment_before }}
    {% endcapture %}
    {% capture _comment %}
      {% unless _carry %}{{ _comments.first }}{% endunless %}
      {{ _comment_outside | split: _comments.last | first }}
      {% if _comment_outside contains _comments.last %}
        {{ _comments.last }}
        {% assign _carry = false %}
      {% else %}
        {% assign _carry = true %}
      {% endif %}
    {% endcapture %}
    {% assign _content = _content | remove_first: _comment %}
  {% endfor %}
  {% if _profile %}
    {% assign _profile_comments = _content | size | plus: 1 %}
  {% endif %}
{% endif %}
code [2]

{% comment %}Remove comments{% endcomment %}

{% if site.compress_html.comments == "all" %}
  {% assign _comments = "<!--,-->|/*,*/" | split: "|" %}
{% else %}
  {% assign _comments = site.compress_html.comments %}
{% endif %}

{% for type in _comments %}
  {% assign _comment_item = type | split: "," %}
  {% if _comment_item.size == 2 %}
    {% capture _comment_befores %}.{{ _content }}{% endcapture %}
    {% assign _comment_befores = _comment_befores | split: _comment_item.first %}
    {% for _comment_before in _comment_befores %}
      {% if forloop.first %}
        {% continue %}
      {% endif %}
      {% capture _comment_outside %}
        {% if _carry %}{{ _comment_item.first }}{% endif %}
        {{ _comment_before }}
      {% endcapture %}
      {% capture _comment %}
        {% unless _carry %}{{ _comment_item.first }}{% endunless %}
        {{ _comment_outside | split: _comment_item.last | first }}
        {% if _comment_outside contains _comment_item.last %}
          {{ _comment_item.last }}
          {% assign _carry = false %}
        {% else %}
          {% assign _carry = true %}
        {% endif %}
      {% endcapture %}
      {% assign _content = _content | remove_first: _comment %}
    {% endfor %}
    {% if _profile %}
      {% assign _profile_comments = _content | size | plus: 1 %}
    {% endif %}
  {% endif %}
{% endfor %}

I'm going to rant for a bit because this is the umpteenth time (#95 #88 #85 #82 #80 ...) I've seen this feature request, and I want somewhere to point the next umpteen. (So don't take it personally)

There are way too many edge cases to make this remotely worth it.

  • A JS comment shouldn't count outside of script tags
  • Does a JS comment end when a script tag ends?
  • Should an HTML comment count inside of script tags?
  • What about HTML comments inside script tag attributes?
  • What about HTML comments inside JS strings?
  • What about JS comments inside JS strings?
  • What about script tags inside JS strings?
  • What about JS comments inside quotes inside JS strings?
  • What about JS comments inside quotes inside JS strings inside script tags that happen to be inside <pre> where we disable compression entirely to preserve whitespace?!
  • What about CSS comments??!?

Now add that because it's implemented by splitting, you can't nest anything. Nest any 2 of the above and the code won't work.

The fastest way to implement this would be to make a fully fledged dom parser. In liquid.

Do you want a render time measured in bytes per hour? Because that is how you get a render time measured in bytes per hour.

Just keep your JS in a separate file

Oh...no worries. I had searched in issues but couldn't find any with current title as my phrase(this issue is literally umpteenth, sry for that); I hope this issue will help other users.

I see you have plenty of points. I hadn't given it thought to the extent you have. Anyway, it is nice to know that its absence is intentional. :)

i agree with @jnvsor: «Just keep your JS in a separate file».