lepture/mistune

[Features request] Convert normal Markdown text to MarkdownV2 text that used in Telegram Bot API

nghlt opened this issue · 6 comments

nghlt commented

The MarkdownV2 text slightly differs with the normal Markdown text. Please update an utility for that purpose.
Thank you

What is MarkdownV2 text?

nghlt commented

It is a type of message format used in Telegram. The syntax is similar but slightly different from the standard Markdown. In addition, there is a list of special characters that are not included in the markup that will be escaped with the backslash.
Please get more information from office Telegram Bot API documentation: https://core.telegram.org/bots/api#markdownv2-style

You can create a custom renderer. Check the markdown renderer.

nghlt commented

There is a function named escape() in util.py. It looks like that below:

def escape(s, quote=True):
    """Escape characters of ``&<>``. If quote=True, ``"`` will be
    converted to ``&quote;``."""
    s = s.replace("&", "&amp;")
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
    return s

I don't know how can set quote = False when convert a Markdown text to HTML text:

import mistune

markdown_text = 'This is a "markdown" text'

creator = mistune.create_markdown()

html_text = creator(markdown_text)

print(html_text)

The output is <p>This is a &quot;markdown&quot; text</p>. But I don't want to escape the " or '.
Is there any solution?

You can just create a new renderer class that inherits from HTMLRenderer and overwrite all methods that calls the escape function and rewrite them identically, but while adding the param quote = False.

For example:

from mistune import HTMLRenderer, escape, safe_entity


class NoQuoteEscapeRenderer(HTMLRenderer):
    def text(self, text: str) -> str:
        if self._escape:
            return escape(text, quote=False)  # in the original method, escape is imported as escape_text
        return safe_entity(text)
    ...
    # or if using mistune v2:
    def text(self, text):
        if self._escape:
            return escape(text, quote=False)
        return escape_html(text)  # this function is named differently between the versions
    ...

You can then parse your markdown text like this:

import mistune

markdown_text = 'This is a "markdown" text'

creator = mistune.create_markdown(renderer=NoQuoteEscapeRenderer())

html_text = creator(markdown_text)

Thanks @rgrignon1.

This issue has been answered by @rgrignon1