adamchainz/django-minify-html

It seems we created the same middleware

Closed this issue · 3 comments

Hi there again,

I created this repo some time ago. It also has Downloads and Downloads

Which also has some traffic :

python

It essentially does what you are doing here (+ more). But there are some problems with this approach.

As discussed here your approach is inefficient. ( Decoding GZIP is useless ). Also your approach doesn't work for special HTML tags like ( X-Init, X-Effect, X-On | Used by AlpineJS which is used in many Django project ).

My approach here fixes some of the issues.
Also to allow the usage in other applications I decided to split the main repo in another project.

Also 2 projects for essentially one end goal is kinda useless. Because we will essentially rewrite same thing without adding any new features. Is there any chance of merging our repos together or focusing on one repo ( which will essentially allow this module to be used in every python application ) ?

Currently my repo has an issue , which if fixed will be a feature complete library. I also have an idea of using esbuild with PyO3 ( which minify-html uses and with which i will replace minify-html ) to make the module even faster. I will also add full support for petite-vue in the next year. But for now the development is a bit slow as I am still a beginner in rust.

Hello again!

I don't believe we're quite rebuilding the same thing. I intended to only minify HTML. Your package takes a different approach - it's trying to do a lot more.

As discussed here your approach is inefficient. ( Decoding GZIP is useless ).

Please can you elaborate? My package is not decoding gzip at the point you linked. The decode() call converts bytes to str.

Also your approach doesn't work for special HTML tags like ( X-Init, X-Effect, X-On | Used by AlpineJS which is used in many Django project ).

Can you explain what you mean by "doesn't work"? The attributes are valid HTML attributes, and minify-html seems to process them just fine when I run the examples:

In [1]: import minify_html

In [2]: minify_html.minify('''<div x-init="console.log('I\'m being initialized!')"></div>''')
Out[2]: '<div x-init="console.log(\'I\'m being initialized!\')"></div>'

Any problem that minify-html has with them is something I'd consider fixing upstream in minify-html, rather than patch it up when wrapping it.

I briefly looked at your package's source code, and you're using a lot of regexes. I'm afraid that approach is flawed. You cannot parse HTML with a regex - it's too complicated. It's not a regular language. There's even this Stack Overflow meme answer that says if you try, you'll summon a pony.

To parse HTML, you need to use a real HTML parser like Python's html.parser (which is even not that good, since it doesn't handle many quirks or parts of the actual HTML spec). Here's a post that I wrote on using it for my blog.

The need for a real parser is why I'm excited to use minify-html (and only minify-html). It uses a very fast HTML parser.

A secondary problem I think your package has is that you only allow one compression algorithm to be active. That's not how HTTP encoding is intended to work. You should inspect the Accept-Encoding header to determine what compression algorithm the client supports, and use the best one available. This is what django-compression-middleware does. I also think the compression middleware is best separate from the HTML minification middleware. Compression can be applied to all responses, not just HTML ones.

Thanks for writing. I hope this has given you some knowledge and ideas!

I don't believe we're quite rebuilding the same thing. I intended to only minify HTML. Your package takes a different approach it's trying to do a lot more.

Correct. But end goal is to have the same minified-html + some added functionality.

I don't believe we're quite rebuilding the same thing

I still believe that It's the same 😂.

Can you explain what you mean by "doesn't work"? The attributes are valid HTML attributes, and minify-html seems to process them just fine when I run the examples:

Take a look at this repo

I ran this with:

Code
import minify_html

x = minify_html.minify(open('index.html').read(),minify_css=True,minify_js=True)

print(x)
Result
{% load static %} <link href="{% static 'core/css/navbar/navbar.css' %}"rel=stylesheet><nav aria-label="main navigation"class="navbar container"x-data="{
            open: false,
            button_turned: false,
        }"role=navigation x-cloak><div x-init="
            () => {
                switch ( $store.is_mobile ) {
                    case true : {
                        open = false;
                        button_turned = true;
                        break;
                    }
                    case false : {
                        open = true;
                        break;
                    }
                }
            }
        "class=navbar-brand><a class="navbar-item is-clickable"href=https://bulma.io> <img src="{% static 'core/images/logo.png' %}"height=28 width=112> </a><a :class="open ? 'is-active' : ''"@click="open = !open"class="navbar-burger is-clickable"role="button "aria-expanded=false aria-label=menu data-target=navbarBasicExample> <span aria-hidden=true></span> <span aria-hidden=true></span> <span aria-hidden=true></span> </a></div><div class="navbar-menu is-active"id=mainNavbar x-collapse x-show=open><div x-show=open x-transition x-transition.delay.50ms><div :class="$store.is_mobile ? 'is-flex is-flex-direction-column' : ''"class="navbar-start mb-3"><button @click.prevent="
                        () => {
                            window.location = '/'
                        }
                    "@mouseenter="
                        () => {
                            anime({
                                targets: '.animejs__home__icon',
                                color: '#e50000',
                            })
                        }
                    "@mouseleave="
                        () => {
                            anime({
                                targets: '.animejs__home__icon',
                                color: '#d9d9d9',
                            })
                        }
                    "class="navbar-item button is-ghost is-rounded ml-1 is-unselectable"><ion-icon class=animejs__home__icon name=home-sharp></ion-icon> <p>Home</p></button><button @click.prevent="
                        () => {
                            window.location = '/blog/'
                        }
                    "@mouseenter="
                        () => {
                            anime({
                                targets: '.animejs__school__icon',
                                color: '#e50000',
                            })
                        }
                    "@mouseleave="
                        () => {
                            anime({
                                targets: '.animejs__school__icon',
                                color: '#d9d9d9',
                            })
                        }
                    "class="navbar-item button is-ghost is-rounded ml-2 is-unselectable"x-init="
                        () => {
                            const string = '{{ request.path }}';

                            if (string.includes('blog') ) {
                                $el.classList.add('hover')
                            }
                        }
                    "><ion-icon class=animejs__school__icon name=school-sharp></ion-icon> <p>Blog</p></button><button @mouseenter="
                        () => {
                            anime({
                                targets: '.animejs__projects__icon',
                                color: '#e50000',
                            })
                        }
                    "@mouseleave="
                        () => {
                            anime({
                                targets: '.animejs__projects__icon',
                                color: '#d9d9d9',
                            })
                        }
                    "class="navbar-item button is-ghost is-rounded ml-2 is-unselectable"x-init="
                            () => {
                                if ('{{ request.path }}' === '/projects/') {
                                    $el.classList.add('hover')
                                }
                            }
                        "><ion-icon class=animejs__projects__icon name=extension-puzzle-sharp></ion-icon> <p>Projects</p></button></div></div><div class=navbar-end><div :class="$store.is_mobile ? 'has-text-centered' : ''"class=navbar-item><a :class="$store.is_mobile ? 'button': ''"@mouseenter="
                        () => {
                            anime({
                                targets: '.animejs__logo__facebook',
                                color: '#4169e1',
                            });
                            anime({
                                targets: '.animejs__facebook__button',
                                scale: 1.3
                            });
                        }
                    "@mouseleave="
                        () => {
                            anime({
                                targets: '.animejs__logo__facebook',
                                    color: 'hsl(0, 0%, 80%)',
                            });
                            anime({
                                targets: '.animejs__facebook__button',
                                scale: 1
                            });
                        }
                    "class="is-rounded is-dark animejs__facebook__button is-clickable"href="{{ settings.site_settings.SocialMediaSettings.facebook }}"x-effect="
                        () => {
                            switch (button_turned) {
                                case true : {
                                    anime({
                                        targets: '.animejs__facebook__button',
                                        translateX: 0,
                                        easing: 'easeOutSine',
                                        duration: 150,
                                        opacity: 1 ,
                                        scale: 1,
                                    });
                                    break;
                                };
                                case false: {
                                    anime({
                                        targets: '.animejs__facebook__button',
                                        translateX: 40 * 2,
                                        easing: 'easeOutSine',
                                        duration: 150,
                                        opacity: 0,
                                        scale: 0.2,
                                    });
                                    break;
                                };
                            }
                        }
                    "x-init="
                        () => {
                            tippy('.animejs__facebook__button', {
                                content: 'Facebook',
                            });
                        }
                    "> <ion-icon :class="$store.is_mobile ? 'is-position-absolute' : ''"x-init="
                            () => {
                                anime({
                                    targets: '.animejs__logo__facebook',
                                    color: 'hsl(0, 0%, 80%)',
                                    scale: 0.6,
                                });
                            }
                        "class=animejs__logo__facebook name=logo-facebook style=width:100%;height:100%></ion-icon> </a><a :class="$store.is_mobile ? 'button ': ''"@mouseenter="
                            () => {
                                anime({
                                    targets: '.animejs__logo__github',
                                    color: 'hsl(0, 0%, 100%)',
                                });
                                anime({
                                    targets: '.animejs__github__button',
                                    scale: 1.3
                                });
                            }
                        "@mouseleave="
                        () => {
                            anime({
                                targets: '.animejs__logo__github',
                                color: 'hsl(0, 0%, 80%)',
                            });
                            anime({
                                targets: '.animejs__github__button',
                                scale: 1
                            });
                        }
                    "class="is-rounded is-dark animejs__github__button is-clickable"href="{{ settings.site_settings.SocialMediaSettings.github }} "x-effect="
                        () => {
                            switch ( button_turned ) {
                                case true : {
                                    anime({
                                        targets: '.animejs__github__button',
                                        translateX: 0,
                                        easing: 'easeOutSine',
                                        duration: 150,
                                        opacity: 1 ,
                                        scale: 1,
                                    });
                                    break;
                                };
                                case false : {
                                    anime({
                                        targets: '.animejs__github__button',
                                        translateX: 40,
                                        easing: 'easeOutSine',
                                        duration: 150,
                                        opacity: 0,
                                        scale: 0.2,
                                    });
                                    break;
                                };
                            }
                        }
                    "x-init="
                        () => {
                            tippy('.animejs__github__button', {
                                content: 'Github',
                            });
                        }
                    "> <ion-icon :class="$store.is_mobile ? 'is-position-absolute' : ''"x-init="
                            () => {
                                anime({
                                    targets: '.animejs__logo__github',
                                    color: 'hsl(0, 0%, 80%)',
                                    scale: 0.6,
                                })

                            }
                        "class=animejs__logo__github name=logo-github style=width:100%;height:100%></ion-icon> </a><a :class="$store.is_mobile ? 'is-hidden' : ''"@click.prevent="
                            () => {
                                anime({
                                    targets: '.animejs__arrow__back',
                                    rotate: button_turned ? 0 : 180,
                                });
                                button_turned = !button_turned;
                            }
                        "@mouseenter="
                        () => {
                            anime({
                                targets: '.animejs__arrow__back',
                                color: '#e50000',
                            });
                            anime({
                                targets: '.animejs__arrow__button',
                                scale: 1.2,
                            });
                        }
                    "@mouseleave="
                        () => {
                            anime({
                                targets: '.animejs__arrow__back',
                                color: 'hsl(0, 0%, 80%)',
                            });
                            anime({
                                targets: '.animejs__arrow__button',
                                scale: 1
                            });
                        }
                    "class="is-rounded is-dark animejs__arrow__button is-clickable"style=z-index:1000000> <ion-icon x-init="
                            () => {
                                anime({
                                    targets: '.animejs__arrow__back',
                                    easing: 'linear',
                                    duration: 100,
                                    color: 'hsl(0, 0%, 80%)',
                                    scale: 0.6,
                                });
                            }
                        "class=animejs__arrow__back name=arrow-back-outline style=width:100%;height:100%></ion-icon> </a></div><template x-if=!$store.is_mobile>{% if request.user.is_authenticated %} <figure class="image is-48x48 pt-2 pl-2"><img src="{% static 'core/images/placeholder.png' %}"class=is-rounded></figure> {% else %} <div class=navbar-item><div class=buttons><button @click.prevent="
                                () => {
                                    window.location = '/authentication/login/?next={{ request.path }}'
                                }
                            "class="button is-ghost is-rounded">Log in</button></div></div> {% endif %}</template></div></div></nav>

See the problem? ( AlpineJs is very popular with Django by the way ). I opened an Issue in upstream

You should inspect the Accept-Encoding header to determine what compression algorithm the client supports, and use the best one available.

Yep i am doing that in here but the approach is kinda flawed. But i wanted to give the developer absolute control over their choice. @adamchainz thanks for giving me another way to approach it.

I briefly looked at your package's source code, and you're using a lot of regexes.

Thats the only way I can think of that captures the special attributes. Feel free to correct me here.

I also think the compression middleware is best separate from the HTML minification middleware.

Okay ? I actually used this with GZIP middleware at first and thats why i wrote it like that. But i feel like ( feel free to ignore this ) we will fall into middleware hell someday if we continue this path.

The need for a real parser is why I'm excited to use minify-html (and only minify-html). It uses a very fast HTML parser.

python_strip_whitespace = minify-html + this

To parse HTML, you need to use a real HTML parser like Python's html.parser (which is even not that good, since it doesn't handle many quirks or parts of the actual HTML spec). Here's a post that I wrote on using it for my blog.

You should look into this. Note that python regex replacing is faster and more performant than doing what minify-html ( it wraps Golang in Rust and then wraps Rust in Python ) is doing.

A proof of this :

1

And the [code] behind it :

Code
# Minifiers
from typing import Optional, Union
from minify_html import minify as rust_minifier

from .html import (
    html_minify as python_minifier,
    mangle_nbsp,
    unmangle_nbsp,
)

# Guess the file content
from .functions import guess

# Import helper functions
from .html import add_line_break


def minify(
    buffer: bytes,
    # Rust
    STRIP_WHITESPACE_RUST_DO_NOT_MINIFY_DOCTYPE: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_ENSURE_SPEC_CONPLIANT_UNQUOTED_ATTRIBUTE_VALUES: Optional[
        bool
    ] = True,
    STRIP_WHITESPACE_RUST_KEEP_CLOSING_TAGS: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_KEEP_COMMENTS: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_KEEP_HTML_AND_HEAD_OPENING_TAGS: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_KEEP_SPACES_BETWEEN_ATTRIBUTES: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_MINIFY_CSS: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_MINIFY_JS: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_REMOVE_BANGS: Optional[bool] = True,
    STRIP_WHITESPACE_RUST_REMOVE_PROCESSING_INSTRUCTIONS: Optional[bool] = True,
    # Python
    STRIP_WHITESPACE_PYTHON_REMOVE_COMMENTS: Optional[bool] = False,
    STRIP_WHITESPACE_PYTHON_CONDENSE_STYLE_FROM_HTML: Optional[bool] = True,
    STRIP_WHITESPACE_PYTHON_CONDENSE_SCRIPT_FROM_HTML: Optional[bool] = True,
    STRIP_WHITESPACE_PYTHON_CLEAN_UNNEEDED_HTML_TAGS: Optional[bool] = True,
    STRIP_WHITESPACE_PYTHON_CONDENSE_HTML_WHITESPACE: Optional[bool] = True,
    STRIP_WHITESPACE_PYTHON_UNQUOTE_HTML_ATTRIBUTES: Optional[bool] = True,
    # NBSP character setting
    STRIP_WHITESPACE_NBSP_MANGLE_CHARACTER: Optional[str] = "'অ'",
    # Compression Settings
    STRIP_WHITESPACE_COMPRESSION_TYPE: Union[
        str("compressed"), str("decompressed")
    ] = str(
        "decompressed"  # Lets default to decompressed bytes
    ),
) -> str:
    buffer_type: Union[
        str("gzip"),
        str("br"),
        str("zstd"),
        str("plain"),
    ]
    decompressed_buffer: str = ""
    return_buffer: bytes = b""

    # We check if the HTML that the server sent us is compressed or decompressed.
    # If the string is decompressed then just set buffer type to plain
    if STRIP_WHITESPACE_COMPRESSION_TYPE == str("compressed"):
        buffer_type = guess(buffer).lower()
    elif STRIP_WHITESPACE_COMPRESSION_TYPE == str("decompressed"):
        buffer_type = "plain"

    # If the buffer is not plain text, check for compression type.
    # But if the buffer is just plain text, don't do unnecessary checks.
    if buffer_type == "plain":
        decompressed_buffer = buffer
    elif buffer_type == "gzip":
        from .functions.decompressors.gzip import decompress as gz_decompress

        decompressed_buffer = gz_decompress(buffer)
    elif buffer_type == "br":
        from .functions.decompressors.brotli import decompress as br_decompress

        decompressed_buffer = br_decompress(buffer)
    elif buffer_type == "zstd":
        from .functions.decompressors.zstd import decompress as zstd_decompress

        decompressed_buffer = zstd_decompress(buffer)

    #   First change the &nbsp; into a special character so the other compressors cant minify that.
    first_iter: str = mangle_nbsp(
        decompressed_buffer.decode(),
        STRIP_WHITESPACE_NBSP_MANGLE_CHARACTER,
    )
    import time,tracemalloc
    tracemalloc.start()
    now = time.time()
    #   Rust based minifier. The most powerful one in here. 💪
    second_iter: str = rust_minifier(
        first_iter,
        do_not_minify_doctype=STRIP_WHITESPACE_RUST_DO_NOT_MINIFY_DOCTYPE,
        ensure_spec_compliant_unquoted_attribute_values=STRIP_WHITESPACE_RUST_ENSURE_SPEC_CONPLIANT_UNQUOTED_ATTRIBUTE_VALUES,
        keep_closing_tags=STRIP_WHITESPACE_RUST_KEEP_CLOSING_TAGS,
        keep_comments=STRIP_WHITESPACE_RUST_KEEP_COMMENTS,
        keep_html_and_head_opening_tags=STRIP_WHITESPACE_RUST_KEEP_HTML_AND_HEAD_OPENING_TAGS,
        keep_spaces_between_attributes=STRIP_WHITESPACE_RUST_KEEP_SPACES_BETWEEN_ATTRIBUTES,
        minify_css=STRIP_WHITESPACE_RUST_MINIFY_CSS,
        minify_js=STRIP_WHITESPACE_RUST_MINIFY_JS,
        remove_bangs=STRIP_WHITESPACE_RUST_REMOVE_BANGS,
        remove_processing_instructions=STRIP_WHITESPACE_RUST_REMOVE_PROCESSING_INSTRUCTIONS,
    )
    end = time.time()
    print(f'''
        Rust Func

        Traced memory : {tracemalloc.get_traced_memory()},
        Time : {end-now}
    ''')
    tracemalloc.stop()

    #   Rust minifier comes first to migrate some of the issues I faced.😛
    #   Specially the python module picks '\n in class=""
    #   So first remove all unnecessary whitespace before adding line_break
    third_iter: str = add_line_break(second_iter)

    #   Finally the python iterator.
    #   I don't know how this works.🤷
    #   So adding it at last
    import time,tracemalloc
    tracemalloc.start()
    now = time.time()
    fourth_iter: str = python_minifier(
        third_iter,
        STRIP_WHITESPACE_PYTHON_REMOVE_COMMENTS,
        STRIP_WHITESPACE_PYTHON_CONDENSE_STYLE_FROM_HTML,
        STRIP_WHITESPACE_PYTHON_CONDENSE_SCRIPT_FROM_HTML,
        STRIP_WHITESPACE_PYTHON_CLEAN_UNNEEDED_HTML_TAGS,
        STRIP_WHITESPACE_PYTHON_CONDENSE_HTML_WHITESPACE,
        STRIP_WHITESPACE_PYTHON_UNQUOTE_HTML_ATTRIBUTES,
    )
    end = time.time()

    print(f'''
        Py Func

        Traced memory : {tracemalloc.get_traced_memory()},
        Time : {end-now}
    ''')
    tracemalloc.stop()

    #   Replace special character with &nbsp;
    last_iter: str = unmangle_nbsp(
        fourth_iter,
        STRIP_WHITESPACE_NBSP_MANGLE_CHARACTER,
    )
    last_iter: bytes = last_iter.encode()

    # Compress the buffer
    if buffer_type == "plain":
        return_buffer = last_iter

    elif buffer_type == "gzip":
        from .functions.compressors.gzip import compress as gz_compress

        return_buffer = gz_compress(last_iter)

    elif buffer_type == "br":
        from .functions.compressors.brotli import compress as br_compress

        return_buffer = br_compress(last_iter)

    elif buffer_type == "zstd":
        from .functions.compressors.zstd import compress as zstd_compress

        return_buffer = zstd_compress(last_iter)

    return return_buffer

Can you explain what you mean by "doesn't work"? The attributes are valid HTML attributes, and minify-html seems to process them just fine when I run the examples:

   In [1]: import minify_html

   In [2]: minify_html.minify('''<div x-init="console.log('I\'m being initialized!')"></div>''')
   Out[2]: '<div x-init="console.log(\'I\'m being initialized!\')"></div>'

It seems that you should modify it to :

minify_html.minify(
    '''
    <div x-init = "
        console.log('Holla!')
    "> </div>
    '''
, minify_css=True, minify_js=True)

see the problem.

Essentially this is my very first large project. At the time of writing I didn't know much about good code. So i made a ton of mistakes and it would mean a lot if someone points me in the right direction. I will happily modify my module to suit the audience at large.

Thanks for reading. @adamchainz please gimme your thoughts .

You should look into this. Note that python regex replacing is faster and more performant than doing what minify-html...

Performance at the cost of correctness is not worth it. The regex based approach simply chokes on HTML about e.g. Alpine. For example here one of your regexes will capture from onwards:

For initial code use <pre>x-init="</pre> and type the appropriate code, ending with <pre>"</pre>.

Also I doubt that running a bunch of Python functions is indeed faster. I would like to see a benchmark. I found that minify-html can minify 10k of HTML in 14 microseconds, which is barely time to make a single Python function call.

See the problem? ( AlpineJs is very popular with Django by the way ). I opened an Issue in upstream

Great, I added a comment to clarify what your request is.

see the problem.

You still didn't communicate the problem in a single statement: that some bytes could still be saved. IMO this is not the end of the world.

Essentially this is my very first large project. At the time of writing I didn't know much about good code. So i made a ton of mistakes and it would mean a lot if someone points me in the right direction. I will happily modify my module to suit the audience at large.

I appreciate this. I think you've done very well for the effort you've put in. I would like to encourage you to keep on learning. There are plenty of resources out there. I'm afraid I don't have time to mentor you.

I'm going to keep my package as-is. It's minimal and matches my tastes. OSS isn't a "competition", several solutions can exist :)

My main feedback for your package is, again, that I would always avoid regex-based HTML parsing. Try using html.parser!

Good luck!