datadesk/django-softhyphen

hyphenator shall specify parser for Beautifulsoup

Closed this issue · 0 comments

jrief commented

django-softhyphen works perfectly if html5lib is not installed. However with html5lib in your Python search path, the given example code

from softhyphen.html import hyphenate
>>> hyphenate("<h1>I love hyphenation</h1>")
u'<html><body><h1>I love hy&shy;phen&shy;a&shy;tion</h1></body></html>'

gives the result string wrapped into a <html><body>... , which is not what we want.

This can be overridden by Monkey-patching with Beautifulsoup.DEFAULT_BUILDER_FEATURES = ['html.parser'], but that might cause other unwanted side-effects. A better approach would be to add a configuration setting in django-softhyphen, which invokes

html.py (line 54)

soup = BeautifulSoup(html, features=BEAUTIFULSOUP_BUILDER_FEATURES)

where BEAUTIFULSOUP_BUILDER_FEATURES defaults to ['html.parser'].

If you accept this feature request, I'll send a PR.