trentm/python-markdown2

Bad handling of two consecutive bold words

m417z opened this issue · 6 comments

m417z commented

Describe the bug
See example below.

To Reproduce

**a** b **c**

Code:

print(markdown2.markdown('**a** b **c**'))

Expected behavior

<p><strong>a</strong> b <strong>c</strong></p>

Actual:

<p><strong>a<em>* b *</em>c</strong></p>

Debug info
Version of library being used: 2.4.11

Any extras being used: None

Additional context
Add any other context about the problem here.

fghaas commented

I can confirm this; it appears to have slipped into the 2.4.11 release. 2.4.10 did not suffer from this issue. Downgrading to 2.4.10 is thus a valid temporary workaround.

Not sure if this helps, but per git bisect the commit that broke this was 0eafad6 (from #531).

Thanks for the reports, we'll check this out

Hi, thanks for reporting this. I've opened a linked PR to address this, let me know if I've missed anything.

gitbra commented

Hi, thanks for the proposed fix.
I tested and the spaces inside ** are not well appreciated, else it looks fine to me.

**Word** **Word **

<p><strong>Word</strong> <em>*Word *</em></p>

That is intentional and should match the behaviour of previous versions. From the CommonMark spec:

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and...

Although the <em> behaviour seems to be inconsistent with this. I'll add a commit to fix that
EDIT: it looks like the <em> regex considers the first * of the closing delimiter as a match for (?<=\S)*. Doesn't seem like an issue to me

fghaas commented

Thanks! I think you may still want to add an entry to the changelog for this fix — I've taken the liberty to submit a PR for that (see #553).