It seems we created the same middleware
Closed this issue · 3 comments
Hi there again,
I created this repo some time ago. It also has and
Which also has some traffic :
It essentially does what you are doing here (+ more). But there are some problems with this approach.
As discussed here your approach is inefficient. ( Decoding GZIP is useless ). Also your approach doesn't work for special HTML tags like ( X-Init, X-Effect, X-On | Used by AlpineJS which is used in many Django project ).
My approach here fixes some of the issues.
Also to allow the usage in other applications I decided to split the main repo in another project.
Also 2 projects for essentially one end goal is kinda useless. Because we will essentially rewrite same thing without adding any new features. Is there any chance of merging our repos together or focusing on one repo ( which will essentially allow this module to be used in every python application ) ?
Currently my repo has an issue , which if fixed will be a feature complete library. I also have an idea of using esbuild with PyO3 ( which minify-html uses and with which i will replace minify-html ) to make the module even faster. I will also add full support for petite-vue in the next year. But for now the development is a bit slow as I am still a beginner in rust.
Hello again!
I don't believe we're quite rebuilding the same thing. I intended to only minify HTML. Your package takes a different approach - it's trying to do a lot more.
As discussed here your approach is inefficient. ( Decoding GZIP is useless ).
Please can you elaborate? My package is not decoding gzip at the point you linked. The decode()
call converts bytes
to str
.
Also your approach doesn't work for special HTML tags like ( X-Init, X-Effect, X-On | Used by AlpineJS which is used in many Django project ).
Can you explain what you mean by "doesn't work"? The attributes are valid HTML attributes, and minify-html seems to process them just fine when I run the examples:
In [1]: import minify_html
In [2]: minify_html.minify('''<div x-init="console.log('I\'m being initialized!')"></div>''')
Out[2]: '<div x-init="console.log(\'I\'m being initialized!\')"></div>'
Any problem that minify-html has with them is something I'd consider fixing upstream in minify-html, rather than patch it up when wrapping it.
I briefly looked at your package's source code, and you're using a lot of regexes. I'm afraid that approach is flawed. You cannot parse HTML with a regex - it's too complicated. It's not a regular language. There's even this Stack Overflow meme answer that says if you try, you'll summon a pony.
To parse HTML, you need to use a real HTML parser like Python's html.parser
(which is even not that good, since it doesn't handle many quirks or parts of the actual HTML spec). Here's a post that I wrote on using it for my blog.
The need for a real parser is why I'm excited to use minify-html (and only minify-html). It uses a very fast HTML parser.
A secondary problem I think your package has is that you only allow one compression algorithm to be active. That's not how HTTP encoding is intended to work. You should inspect the Accept-Encoding
header to determine what compression algorithm the client supports, and use the best one available. This is what django-compression-middleware does. I also think the compression middleware is best separate from the HTML minification middleware. Compression can be applied to all responses, not just HTML ones.
Thanks for writing. I hope this has given you some knowledge and ideas!
I don't believe we're quite rebuilding the same thing. I intended to only minify HTML. Your package takes a different approach it's trying to do a lot more.
Correct. But end goal is to have the same minified-html + some added functionality.
I don't believe we're quite rebuilding the same thing
I still believe that It's the same 😂.
Can you explain what you mean by "doesn't work"? The attributes are valid HTML attributes, and minify-html seems to process them just fine when I run the examples:
Take a look at this repo
I ran this with:
Code
import minify_html
x = minify_html.minify(open('index.html').read(),minify_css=True,minify_js=True)
print(x)
Result
{% load static %} <link href="{% static 'core/css/navbar/navbar.css' %}"rel=stylesheet><nav aria-label="main navigation"class="navbar container"x-data="{
open: false,
button_turned: false,
}"role=navigation x-cloak><div x-init="
() => {
switch ( $store.is_mobile ) {
case true : {
open = false;
button_turned = true;
break;
}
case false : {
open = true;
break;
}
}
}
"class=navbar-brand><a class="navbar-item is-clickable"href=https://bulma.io> <img src="{% static 'core/images/logo.png' %}"height=28 width=112> </a><a :class="open ? 'is-active' : ''"@click="open = !open"class="navbar-burger is-clickable"role="button "aria-expanded=false aria-label=menu data-target=navbarBasicExample> <span aria-hidden=true></span> <span aria-hidden=true></span> <span aria-hidden=true></span> </a></div><div class="navbar-menu is-active"id=mainNavbar x-collapse x-show=open><div x-show=open x-transition x-transition.delay.50ms><div :class="$store.is_mobile ? 'is-flex is-flex-direction-column' : ''"class="navbar-start mb-3"><button @click.prevent="
() => {
window.location = '/'
}
"@mouseenter="
() => {
anime({
targets: '.animejs__home__icon',
color: '#e50000',
})
}
"@mouseleave="
() => {
anime({
targets: '.animejs__home__icon',
color: '#d9d9d9',
})
}
"class="navbar-item button is-ghost is-rounded ml-1 is-unselectable"><ion-icon class=animejs__home__icon name=home-sharp></ion-icon> <p>Home</p></button><button @click.prevent="
() => {
window.location = '/blog/'
}
"@mouseenter="
() => {
anime({
targets: '.animejs__school__icon',
color: '#e50000',
})
}
"@mouseleave="
() => {
anime({
targets: '.animejs__school__icon',
color: '#d9d9d9',
})
}
"class="navbar-item button is-ghost is-rounded ml-2 is-unselectable"x-init="
() => {
const string = '{{ request.path }}';
if (string.includes('blog') ) {
$el.classList.add('hover')
}
}
"><ion-icon class=animejs__school__icon name=school-sharp></ion-icon> <p>Blog</p></button><button @mouseenter="
() => {
anime({
targets: '.animejs__projects__icon',
color: '#e50000',
})
}
"@mouseleave="
() => {
anime({
targets: '.animejs__projects__icon',
color: '#d9d9d9',
})
}
"class="navbar-item button is-ghost is-rounded ml-2 is-unselectable"x-init="
() => {
if ('{{ request.path }}' === '/projects/') {
$el.classList.add('hover')
}
}
"><ion-icon class=animejs__projects__icon name=extension-puzzle-sharp></ion-icon> <p>Projects</p></button></div></div><div class=navbar-end><div :class="$store.is_mobile ? 'has-text-centered' : ''"class=navbar-item><a :class="$store.is_mobile ? 'button': ''"@mouseenter="
() => {
anime({
targets: '.animejs__logo__facebook',
color: '#4169e1',
});
anime({
targets: '.animejs__facebook__button',
scale: 1.3
});
}
"@mouseleave="
() => {
anime({
targets: '.animejs__logo__facebook',
color: 'hsl(0, 0%, 80%)',
});
anime({
targets: '.animejs__facebook__button',
scale: 1
});
}
"class="is-rounded is-dark animejs__facebook__button is-clickable"href="{{ settings.site_settings.SocialMediaSettings.facebook }}"x-effect="
() => {
switch (button_turned) {
case true : {
anime({
targets: '.animejs__facebook__button',
translateX: 0,
easing: 'easeOutSine',
duration: 150,
opacity: 1 ,
scale: 1,
});
break;
};
case false: {
anime({
targets: '.animejs__facebook__button',
translateX: 40 * 2,
easing: 'easeOutSine',
duration: 150,
opacity: 0,
scale: 0.2,
});
break;
};
}
}
"x-init="
() => {
tippy('.animejs__facebook__button', {
content: 'Facebook',
});
}
"> <ion-icon :class="$store.is_mobile ? 'is-position-absolute' : ''"x-init="
() => {
anime({
targets: '.animejs__logo__facebook',
color: 'hsl(0, 0%, 80%)',
scale: 0.6,
});
}
"class=animejs__logo__facebook name=logo-facebook style=width:100%;height:100%></ion-icon> </a><a :class="$store.is_mobile ? 'button ': ''"@mouseenter="
() => {
anime({
targets: '.animejs__logo__github',
color: 'hsl(0, 0%, 100%)',
});
anime({
targets: '.animejs__github__button',
scale: 1.3
});
}
"@mouseleave="
() => {
anime({
targets: '.animejs__logo__github',
color: 'hsl(0, 0%, 80%)',
});
anime({
targets: '.animejs__github__button',
scale: 1
});
}
"class="is-rounded is-dark animejs__github__button is-clickable"href="{{ settings.site_settings.SocialMediaSettings.github }} "x-effect="
() => {
switch ( button_turned ) {
case true : {
anime({
targets: '.animejs__github__button',
translateX: 0,
easing: 'easeOutSine',
duration: 150,
opacity: 1 ,
scale: 1,
});
break;
};
case false : {
anime({
targets: '.animejs__github__button',
translateX: 40,
easing: 'easeOutSine',
duration: 150,
opacity: 0,
scale: 0.2,
});
break;
};
}
}
"x-init="
() => {
tippy('.animejs__github__button', {
content: 'Github',
});
}
"> <ion-icon :class="$store.is_mobile ? 'is-position-absolute' : ''"x-init="
() => {
anime({
targets: '.animejs__logo__github',
color: 'hsl(0, 0%, 80%)',
scale: 0.6,
})
}
"class=animejs__logo__github name=logo-github style=width:100%;height:100%></ion-icon> </a><a :class="$store.is_mobile ? 'is-hidden' : ''"@click.prevent="
() => {
anime({
targets: '.animejs__arrow__back',
rotate: button_turned ? 0 : 180,
});
button_turned = !button_turned;
}
"@mouseenter="
() => {
anime({
targets: '.animejs__arrow__back',
color: '#e50000',
});
anime({
targets: '.animejs__arrow__button',
scale: 1.2,
});
}
"@mouseleave="
() => {
anime({
targets: '.animejs__arrow__back',
color: 'hsl(0, 0%, 80%)',
});
anime({
targets: '.animejs__arrow__button',
scale: 1
});
}
"class="is-rounded is-dark animejs__arrow__button is-clickable"style=z-index:1000000> <ion-icon x-init="
() => {
anime({
targets: '.animejs__arrow__back',
easing: 'linear',
duration: 100,
color: 'hsl(0, 0%, 80%)',
scale: 0.6,
});
}
"class=animejs__arrow__back name=arrow-back-outline style=width:100%;height:100%></ion-icon> </a></div><template x-if=!$store.is_mobile>{% if request.user.is_authenticated %} <figure class="image is-48x48 pt-2 pl-2"><img src="{% static 'core/images/placeholder.png' %}"class=is-rounded></figure> {% else %} <div class=navbar-item><div class=buttons><button @click.prevent="
() => {
window.location = '/authentication/login/?next={{ request.path }}'
}
"class="button is-ghost is-rounded">Log in</button></div></div> {% endif %}</template></div></div></nav>
See the problem? ( AlpineJs is very popular with Django by the way ). I opened an Issue in upstream
You should inspect the Accept-Encoding header to determine what compression algorithm the client supports, and use the best one available.
Yep i am doing that in here but the approach is kinda flawed. But i wanted to give the developer absolute control over their choice. @adamchainz thanks for giving me another way to approach it.
I briefly looked at your package's source code, and you're using a lot of regexes.
Thats the only way I can think of that captures the special attributes. Feel free to correct me here.
I also think the compression middleware is best separate from the HTML minification middleware.
Okay ? I actually used this with GZIP middleware at first and thats why i wrote it like that. But i feel like ( feel free to ignore this ) we will fall into middleware hell someday if we continue this path.
The need for a real parser is why I'm excited to use minify-html (and only minify-html). It uses a very fast HTML parser.
python_strip_whitespace = minify-html + this
To parse HTML, you need to use a real HTML parser like Python's html.parser (which is even not that good, since it doesn't handle many quirks or parts of the actual HTML spec). Here's a post that I wrote on using it for my blog.
You should look into this. Note that python regex replacing is faster and more performant than doing what minify-html ( it wraps Golang in Rust and then wraps Rust in Python ) is doing.
A proof of this :
And the [code] behind it :
Code
# Minifiers
from typing import Optional, Union
from minify_html import minify as rust_minifier
from .html import (
html_minify as python_minifier,
mangle_nbsp,
unmangle_nbsp,
)
# Guess the file content
from .functions import guess
# Import helper functions
from .html import add_line_break
def minify(
buffer: bytes,
# Rust
STRIP_WHITESPACE_RUST_DO_NOT_MINIFY_DOCTYPE: Optional[bool] = True,
STRIP_WHITESPACE_RUST_ENSURE_SPEC_CONPLIANT_UNQUOTED_ATTRIBUTE_VALUES: Optional[
bool
] = True,
STRIP_WHITESPACE_RUST_KEEP_CLOSING_TAGS: Optional[bool] = True,
STRIP_WHITESPACE_RUST_KEEP_COMMENTS: Optional[bool] = True,
STRIP_WHITESPACE_RUST_KEEP_HTML_AND_HEAD_OPENING_TAGS: Optional[bool] = True,
STRIP_WHITESPACE_RUST_KEEP_SPACES_BETWEEN_ATTRIBUTES: Optional[bool] = True,
STRIP_WHITESPACE_RUST_MINIFY_CSS: Optional[bool] = True,
STRIP_WHITESPACE_RUST_MINIFY_JS: Optional[bool] = True,
STRIP_WHITESPACE_RUST_REMOVE_BANGS: Optional[bool] = True,
STRIP_WHITESPACE_RUST_REMOVE_PROCESSING_INSTRUCTIONS: Optional[bool] = True,
# Python
STRIP_WHITESPACE_PYTHON_REMOVE_COMMENTS: Optional[bool] = False,
STRIP_WHITESPACE_PYTHON_CONDENSE_STYLE_FROM_HTML: Optional[bool] = True,
STRIP_WHITESPACE_PYTHON_CONDENSE_SCRIPT_FROM_HTML: Optional[bool] = True,
STRIP_WHITESPACE_PYTHON_CLEAN_UNNEEDED_HTML_TAGS: Optional[bool] = True,
STRIP_WHITESPACE_PYTHON_CONDENSE_HTML_WHITESPACE: Optional[bool] = True,
STRIP_WHITESPACE_PYTHON_UNQUOTE_HTML_ATTRIBUTES: Optional[bool] = True,
# NBSP character setting
STRIP_WHITESPACE_NBSP_MANGLE_CHARACTER: Optional[str] = "'অ'",
# Compression Settings
STRIP_WHITESPACE_COMPRESSION_TYPE: Union[
str("compressed"), str("decompressed")
] = str(
"decompressed" # Lets default to decompressed bytes
),
) -> str:
buffer_type: Union[
str("gzip"),
str("br"),
str("zstd"),
str("plain"),
]
decompressed_buffer: str = ""
return_buffer: bytes = b""
# We check if the HTML that the server sent us is compressed or decompressed.
# If the string is decompressed then just set buffer type to plain
if STRIP_WHITESPACE_COMPRESSION_TYPE == str("compressed"):
buffer_type = guess(buffer).lower()
elif STRIP_WHITESPACE_COMPRESSION_TYPE == str("decompressed"):
buffer_type = "plain"
# If the buffer is not plain text, check for compression type.
# But if the buffer is just plain text, don't do unnecessary checks.
if buffer_type == "plain":
decompressed_buffer = buffer
elif buffer_type == "gzip":
from .functions.decompressors.gzip import decompress as gz_decompress
decompressed_buffer = gz_decompress(buffer)
elif buffer_type == "br":
from .functions.decompressors.brotli import decompress as br_decompress
decompressed_buffer = br_decompress(buffer)
elif buffer_type == "zstd":
from .functions.decompressors.zstd import decompress as zstd_decompress
decompressed_buffer = zstd_decompress(buffer)
# First change the into a special character so the other compressors cant minify that.
first_iter: str = mangle_nbsp(
decompressed_buffer.decode(),
STRIP_WHITESPACE_NBSP_MANGLE_CHARACTER,
)
import time,tracemalloc
tracemalloc.start()
now = time.time()
# Rust based minifier. The most powerful one in here. 💪
second_iter: str = rust_minifier(
first_iter,
do_not_minify_doctype=STRIP_WHITESPACE_RUST_DO_NOT_MINIFY_DOCTYPE,
ensure_spec_compliant_unquoted_attribute_values=STRIP_WHITESPACE_RUST_ENSURE_SPEC_CONPLIANT_UNQUOTED_ATTRIBUTE_VALUES,
keep_closing_tags=STRIP_WHITESPACE_RUST_KEEP_CLOSING_TAGS,
keep_comments=STRIP_WHITESPACE_RUST_KEEP_COMMENTS,
keep_html_and_head_opening_tags=STRIP_WHITESPACE_RUST_KEEP_HTML_AND_HEAD_OPENING_TAGS,
keep_spaces_between_attributes=STRIP_WHITESPACE_RUST_KEEP_SPACES_BETWEEN_ATTRIBUTES,
minify_css=STRIP_WHITESPACE_RUST_MINIFY_CSS,
minify_js=STRIP_WHITESPACE_RUST_MINIFY_JS,
remove_bangs=STRIP_WHITESPACE_RUST_REMOVE_BANGS,
remove_processing_instructions=STRIP_WHITESPACE_RUST_REMOVE_PROCESSING_INSTRUCTIONS,
)
end = time.time()
print(f'''
Rust Func
Traced memory : {tracemalloc.get_traced_memory()},
Time : {end-now}
''')
tracemalloc.stop()
# Rust minifier comes first to migrate some of the issues I faced.😛
# Specially the python module picks '\n in class=""
# So first remove all unnecessary whitespace before adding line_break
third_iter: str = add_line_break(second_iter)
# Finally the python iterator.
# I don't know how this works.🤷
# So adding it at last
import time,tracemalloc
tracemalloc.start()
now = time.time()
fourth_iter: str = python_minifier(
third_iter,
STRIP_WHITESPACE_PYTHON_REMOVE_COMMENTS,
STRIP_WHITESPACE_PYTHON_CONDENSE_STYLE_FROM_HTML,
STRIP_WHITESPACE_PYTHON_CONDENSE_SCRIPT_FROM_HTML,
STRIP_WHITESPACE_PYTHON_CLEAN_UNNEEDED_HTML_TAGS,
STRIP_WHITESPACE_PYTHON_CONDENSE_HTML_WHITESPACE,
STRIP_WHITESPACE_PYTHON_UNQUOTE_HTML_ATTRIBUTES,
)
end = time.time()
print(f'''
Py Func
Traced memory : {tracemalloc.get_traced_memory()},
Time : {end-now}
''')
tracemalloc.stop()
# Replace special character with
last_iter: str = unmangle_nbsp(
fourth_iter,
STRIP_WHITESPACE_NBSP_MANGLE_CHARACTER,
)
last_iter: bytes = last_iter.encode()
# Compress the buffer
if buffer_type == "plain":
return_buffer = last_iter
elif buffer_type == "gzip":
from .functions.compressors.gzip import compress as gz_compress
return_buffer = gz_compress(last_iter)
elif buffer_type == "br":
from .functions.compressors.brotli import compress as br_compress
return_buffer = br_compress(last_iter)
elif buffer_type == "zstd":
from .functions.compressors.zstd import compress as zstd_compress
return_buffer = zstd_compress(last_iter)
return return_buffer
Can you explain what you mean by "doesn't work"? The attributes are valid HTML attributes, and minify-html seems to process them just fine when I run the examples:
In [1]: import minify_html
In [2]: minify_html.minify('''<div x-init="console.log('I\'m being initialized!')"></div>''')
Out[2]: '<div x-init="console.log(\'I\'m being initialized!\')"></div>'
It seems that you should modify it to :
minify_html.minify(
'''
<div x-init = "
console.log('Holla!')
"> </div>
'''
, minify_css=True, minify_js=True)
see the problem.
Essentially this is my very first large project. At the time of writing I didn't know much about good code. So i made a ton of mistakes and it would mean a lot if someone points me in the right direction. I will happily modify my module to suit the audience at large.
Thanks for reading. @adamchainz please gimme your thoughts .
You should look into this. Note that python regex replacing is faster and more performant than doing what minify-html...
Performance at the cost of correctness is not worth it. The regex based approach simply chokes on HTML about e.g. Alpine. For example here one of your regexes will capture from onwards:
For initial code use <pre>x-init="</pre> and type the appropriate code, ending with <pre>"</pre>.
Also I doubt that running a bunch of Python functions is indeed faster. I would like to see a benchmark. I found that minify-html can minify 10k of HTML in 14 microseconds, which is barely time to make a single Python function call.
See the problem? ( AlpineJs is very popular with Django by the way ). I opened an Issue in upstream
Great, I added a comment to clarify what your request is.
see the problem.
You still didn't communicate the problem in a single statement: that some bytes could still be saved. IMO this is not the end of the world.
Essentially this is my very first large project. At the time of writing I didn't know much about good code. So i made a ton of mistakes and it would mean a lot if someone points me in the right direction. I will happily modify my module to suit the audience at large.
I appreciate this. I think you've done very well for the effort you've put in. I would like to encourage you to keep on learning. There are plenty of resources out there. I'm afraid I don't have time to mentor you.
I'm going to keep my package as-is. It's minimal and matches my tastes. OSS isn't a "competition", several solutions can exist :)
My main feedback for your package is, again, that I would always avoid regex-based HTML parsing. Try using html.parser
!
Good luck!