Unbounded tree depth causes eventual segfault due to excessive stack frames
kevinhu opened this issue · 1 comments
kevinhu commented
Hi! This is a super useful tool. I'm using this package for a scraper, and I ran into a segfault when running minify_html 0.11.1 (default settings) with this particular website: https://gist.github.com/kevinhu/1c60437a9cecf3b8c741c3f006d35b8f
To reproduce:
import minify_html
with open("./bad_website.html", "r") as f:
long_html = f.read()
minified = minify_html.minify(long_html)
I also tried using minify_html_onepass, which fails gracefully with the following error:
SyntaxError: Closing tag name does not match opening tag (expected "span", got "a"). [Character 2824653]
wilsonzlin commented
I did a quick test and this appears to be due to an extremely deep tree; there were over 4000 stack frames before the segfault. This isn't really solvable but I could implement a feature to limit the max parse depth.