buriy/python-readability

Orphan links in doc.summary()

adbar opened this issue · 0 comments

adbar commented

Hi,

a user run into this bug: adbar/trafilatura#21
There are links which end up being orphans between paragraphs, which messes up text rendering and conversion. The problem comes from the output of readability-lxml:

<p>Среди жанров многопользовательских игр MMOFPS занимают одну из лидирующих позиций, наряду с </p><a href="https://gametarget.ru/mmorpg/">MMORPG</a><p> и </p><a href="https://gametarget.ru/feature/moba/">MOBA</a><p>. (https://gametarget.ru/mmofps/)

Could you please have a look at it?