buriy/python-readability

Some tags create unnecessary paragraphs

GabMus opened this issue · 1 comments

In some websites (phoronix for example) some tags (notably a and em) are wrapped in their own unnecessary paragraph.

This causes unnecessary line breaks, ultimately hurting the page's readability. Here's an example.

Original article

Readability content:

<div xmlns:html="http://www.w3.org/1999/xhtml"><div class="content">
<p>
Disclosed back in November was the </p><a href="https://www.phoronix.com/vr.php?view=28462">Intel Jump Conditional Code (JCC) erratum</a><p> affecting Skylake and newer CPUs that could lead to "unpredictable behavior" when jump instructions cross cache lines. Intel issued a CPU microcode update to address the problem at a performance cost, but with some compiler toolchain magic, it's possible to mitigate a good portion of that impact.

</p><p>Intel had the </p><a href="https://www.phoronix.com/scan.php?page=news_item&amp;px=GNU-Assembler-Patches-JCC">GNU Assembler patches around JCC Erratum</a><p> sent out and similarly work going on within the LLVM camp given its ever increasing usage. LLVM developers have been </p><a href="https://www.phoronix.com/scan.php?page=news_item&amp;px=LLVM-Review-JCC-Offset">debating their own patches for helping to mitigate the JCC Erratum impact</a><p>.

</p><p>That LLVM debate is still ongoing but on Friday some preliminary work was merged in order to get the discussion moving forward. The </p><a href="https://github.com/llvm/llvm-project/commit/14fc20ca62821b5f85582bf76a467d412248c248">initial patch</a><p> landing in LLVM 10 explained:
</p><blockquote>WARNING: If you're looking at this patch because you're looking for a full performace mitigation of the Intel JCC Erratum, this is not it!
<br></br>
<br></br>This is a preliminary patch on the patch towards mitigating the performance regressions caused by Intel's microcode update for Jump Conditional Code Erratum.
<br></br>
<br></br>The patch adds the required assembler infrastructure and command line options needed to exercise the logic for INTERNAL TESTING.  These are NOT public flags, and should not be used for anything other than LLVM's own testing/debugging purposes.  They are likely to change both in spelling and meaning.
<br></br>
<br></br>WARNING: This patch is knowingly incorrect in some cornercases.  We need, and do not yet provide, a mechanism to selective enable/disable the padding. Conversation on this will continue in parallel with work on extending this infrastructure to support prefix padding.
<br></br>
<br></br>The goal here is to have the assembler align specific instructions such that they neither cross or end at a 32 byte boundary.</blockquote>
<p>So the flags aren't yet public and it's not yet stable, but it's a start and will hopefully move the discussion forward. Though in mid-January is the LLVM / Clang 10.0 feature freeze so hopefully the work can come together in time. The LLVM work continues to be discussed </p><a href="https://reviews.llvm.org/D70157#1793280">here</a><p>.
</p><p align="center"><a href="//www.phoronix.com/image-viewer.php?id=intel-skylake-xeon9&amp;image=intel_sklxeon_1_lrg" target="_blank"><img src="//www.phoronix.net/image.php?id=intel-skylake-xeon9&amp;image=intel_sklxeon_1_med"></img></a></p>
<p>In upstream Binutils, </p><a href="https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git&amp;a=search&amp;h=HEAD&amp;st=commit&amp;s=JCC">their JCC Erratum patches landed</a><p> just over a week ago without much fanfare. The patches landed there though require setting </p><em>-mbranches-within-32B-boundaries</em><p> for the GNU Assembler (GAS) as it's not enabled by default. When it comes to the GNU toolchain support, the only Linux distribution I am aware of shipping with patched support to help offset that performance impact is Clear Linux where the mitigated behavior is applied by default.</p></div>
							</div>

any updates on this?