Bad handling of two consecutive bold words

Question

Bad handling of two consecutive bold words

m417z opened this issue a year ago · 6 comments

Describe the bug
See example below.

To Reproduce

**a** b **c**

Code:

print(markdown2.markdown('**a** b **c**'))

Expected behavior

<p><strong>a</strong> b <strong>c</strong></p>

Actual:

<p><strong>a<em>* b *</em>c</strong></p>

Debug info
Version of library being used: 2.4.11

Any extras being used: None

Additional context
Add any other context about the problem here.

Answer 1 · 2023-12-07T10:44:20.000Z

I can confirm this; it appears to have slipped into the 2.4.11 release. 2.4.10 did not suffer from this issue. Downgrading to 2.4.10 is thus a valid temporary workaround.

Not sure if this helps, but per git bisect the commit that broke this was 0eafad6 (from #531).

Answer 2 · 2023-12-07T22:26:08.000Z

Thanks for the reports, we'll check this out

Answer 3 · 2023-12-09T11:55:07.000Z

Hi, thanks for reporting this. I've opened a linked PR to address this, let me know if I've missed anything.

Answer 4 · 2023-12-09T18:21:42.000Z

Hi, thanks for the proposed fix.
I tested and the spaces inside ** are not well appreciated, else it looks fine to me.

**Word** **Word **

Word *Word *

Answer 5 · 2023-12-09T19:30:24.000Z

That is intentional and should match the behaviour of previous versions. From the CommonMark spec:

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and...

Although the  behaviour seems to be inconsistent with this. I'll add a commit to fix that
EDIT: it looks like the  regex considers the first * of the closing delimiter as a match for (?<=\S)*. Doesn't seem like an issue to me

Answer 6 · 2023-12-11T09:31:32.000Z

Thanks! I think you may still want to add an entry to the changelog for this fix — I've taken the liberty to submit a PR for that (see #553).