Unclosed tag is causing reading length calculation to fail
Closed this issue · 0 comments
sentry-io commented
Some problem parsing this content causes the SAX parser used for docx export to throw an exception.
This particular code is also used by the function that calculates total reading duration, so this has a much nastier effect of making the section itself inaccessible.
To fix this:
- Try to better handle the malformed HTML
- Separately, make reading-length calculation more resilient to downstream errors—ideally we should never fail to display a section of content just because it's malformed, even if we'd eventually fail to serialize it for export (a much more rare occasion).
h/t @bensteinberg for bringing this to my attention
Sentry Issue: H2O-9D
TypeError: 'ContentNode' object is not subscriptable
File "django/template/base.py", line 829, in _resolve_lookup
current = current[bit]
SaxError: Unexpected element closed: a
(47 additional frame(s) were not displayed)
...
File "main/models.py", line 2715, in calculate_reading_length
html_out = annotated_content_for_export(self)
File "main/export.py", line 272, in annotated_content_for_export
dest_tree = handler.get_output_tree()
File "main/export.py", line 239, in get_output_tree
method(*args)
File "src/lxml/sax.py", line 144, in lxml.sax.ElementTreeContentHandler.endElement
File "src/lxml/sax.py", line 134, in lxml.sax.ElementTreeContentHandler.endElementNS