Pycord-Development/pycord

An attribute that stores the Markdown properties of the message

denisnumb opened this issue · 1 comments

Summary

An attribute of discord.Message that would store the Markdown properties of the message text. For example bold=True, italic=True, etc.

What is the feature request for?

The core library

The Problem

I need to get a list of Markdown tokens from the message text exactly in the form in which Markdown is processed in discord.


I have a class MarkdownToken:

class MarkdownToken:
    @property
    def is_url(self) -> bool:
        return self.url is not None

    def __init__(
            self,
            raw_text: str, 
            text: str=None,
            *,
            url: str=None,
            bold: bool=False, 
            italic: bool=False, 
            underlined: bool=False, 
            strikethrough: bool=False, 
            spoiler: bool=False
            ) -> None:
        self.raw_text = raw_text
        self.text = text or raw_text
        self.url = url
        self.bold = bold
        self.italic = italic
        self.underlined = underlined
        self.strikethrough = strikethrough
        self.spoiler= spoiler

And a dictionary for tests, where key is the original message text, and value is the list of MarkdownToken for this text.

For example:

{
    'text without markdown': [
        MarkdownToken('text without markdown')
    ],

    '**bold text**': [
        MarkdownToken('**bold text**', 'bold text', bold=True)
    ],

    '*italic text*': [
        MarkdownToken('*italic text*', 'italic text', italic=True)
    ],
    
    '_italic text_': [
        MarkdownToken('_italic text_', 'italic text', italic=True)
    ],

    '__underlined text__': [
        MarkdownToken('__underlined text__', 'underlined text', underlined=True)
    ],

    '~~strikethrough text~~': [
        MarkdownToken('~~strikethrough text~~', 'strikethrough text', strikethrough=True)
    ],

    '||spoiler||': [
        MarkdownToken('||spoiler||', 'spoiler', spoiler=True)
    ],

    'https://github.com/': [
        MarkdownToken('https://github.com/', url='https://github.com/')
    ],

    '[url](https://github.com/)': [
        MarkdownToken('[url](https://github.com/)', 'url', url='https://github.com/')
    ],

    '***bold and italic text***': [
        MarkdownToken('***bold and italic text***', 'bold and italic text', bold=True, italic=True)
    ],

    '___underlined italic text___': [
        MarkdownToken('___underlined italic text___', 'underlined italic text', italic=True, underlined=True)
    ],

    '_**bold and italic text**_': [
        MarkdownToken('_**bold and italic text**_', 'bold and italic text', bold=True, italic=True)
    ],

    '~~__***bold, italic, strikethrough and underlined text***__~~': [
        MarkdownToken('~~__***bold, italic, strikethrough and underlined text***__~~', 'bold, italic, strikethrough and underlined text', bold=True, italic=True, underlined=True, strikethrough=True)
    ],

    'ordinary, *italic* and **bold**': [
        MarkdownToken('ordinary, '), 
        MarkdownToken('*italic*', 'italic', italic=True), 
        MarkdownToken(' and '), 
        MarkdownToken('**bold**', 'bold', bold=True)
    ],

    'text, [url](https://github.com/) и **bold text**': [
        MarkdownToken('text, '), 
        MarkdownToken('[url](https://github.com/)', 'url', url='https://github.com/'), 
        MarkdownToken(' and '), 
        MarkdownToken('**bold text**', 'bold text', bold=True)
    ],

    '**bold text** and url: https://github.com/': [
        MarkdownToken('**bold text**', 'bold text', bold=True), 
        MarkdownToken(' and url: '), 
        MarkdownToken('https://github.com/', url='https://github.com/')
    ],

    '**https://github.com/** — bold url': [
        MarkdownToken('**https://github.com/**', 'https://github.com/', url='https://github.com/', bold=True), 
        MarkdownToken(' — bold url')
    ],

    '[__**bold underlined link**__](https://github.com/)': [
        MarkdownToken('[__**bold underlined link**__](https://github.com/)', 'bold underlined link', url='https://github.com/', bold=True, underlined=True)
    ]
}

Of course, if you have the original raw text of the message, you can use the Markdown parser. But, unfortunately, other parsers do not work the way it works in discord.

I tried: marko, mistletoe, mistune, discord_markdown_ast_parser and discord_markdown

But there's a problem everywhere. For example:

Most parsers treat __underlined__ as bold text rather than underlined text.

Some parsers process tests incorrectly. For example, ___underlined italic text___ is treated only as UNDERLINE, and ITALIC is ignored and the text remains as: _underlined italic text_.

And almost all parsers have problems processing escaped characters.

Discord handles them like this:

  • \**text**: **text**
  • *\*text**: **text**
  • **text\**: *text*
  • **text*\*: *text*

But Github is already like this:

  • \**text**: *text*
  • *\*text**: *text*
  • **text\**: *text*
  • **text*\*: *text*

And I need to get Markdown exactly in the form in which discord processes it

The Ideal Solution

The ideal solution would be a discord.Message attribute that stores a list of tokens in some common markdown format. For example:

@bot.event
async def on_message(message: discord.Message) -> None:
    print(message.content)  # 'text, [url](https://github.com/) and **bold text**'
    print(message.markdown) # [
                            #   Token(content='text, '),
                            #   Token(content='url', url='https://github.com/'),
                            #   Token(content=' and '),
                            #   Token(content='bold text', bold=True)
                            # ]

The Current Solution

At the moment, to solve this problem you will have to write your own parser or modify an existing one so that the text is processed exactly like in discord.

Additional Context

No response

This requires extra parsing and will slow the bot down. Won't be supported. Use escape_markdown and remove_markdown.