go-shiori/shiori

Parsing error and missing content on theregister.com

lgrn opened this issue · 0 comments

lgrn commented

Data

  • Shiori version: 1.6.0 (build 595cb45)
  • Database Engine: sqlite
  • Operating system: Debian 12
  • CLI/Web interface/Web Extension: None

Describe the bug / actual behavior

Shiori fails to parse quotes, they are not included in the saved content.

Expected behavior

The quotes are a part of the article, and should be included, preferably with some kind of UI indication that they are quotes, but at the very least included at all.

To Reproduce

Steps to reproduce the behavior:

  1. Save the article https://www.theregister.com/2024/03/18/truenas_abandons_freebsd/
  2. Inspect the saved content
  3. Note that the paragraph beginning with "The creator of PC-BSD(...)" has been saved
  4. Note that the following quote beginning with "Right now the plan(...)" is missing

Notes

This is an HTML excerpt of the problematic section -- the <p> within the <div> is not included:

<p>The creator of PC-BSD(...)</p>
<div class="blockextract">
<p>Right now the plan(...)</p>
</div>