OPDS1: html text constructs must be unescaped
llemeurfr opened this issue · 0 comments
llemeurfr commented
The Columbia feed, for instance, contains html summaries in OPDS entries. These are formatted like:
<summary type="html"><p>Benchmark for Faithful Digital Reproductions of Monographs and Serials. Version 1. .... Columbia University Catalog: go to CLIO</p>
<p>
<a href="https://clio.columbia.edu/catalog/14642100">Go to catalog record in CLIO.</a>
</p></summary>
Because an OPDS 1 feed is an extension of an Atom feed, rules of Atom feeds apply -> https://tools.ietf.org/html/rfc4287#section-3.1.1.2 in particular.
If the value of "type" is "html", the content of the Text construct MUST NOT contain child elements and SHOULD be suitable for handling as HTML [HTML]. Any markup within MUST be escaped; for example, "
" as "<br>". HTML markup within SHOULD be such that it could validly appear directly within an HTMLelement, after unescaping. Atom Processors that display such content MAY use that markup to aid in its display.
Such content (of type html) must therefore be unescaped before being injected inside the webview, after some security cleaning.