kadircet/COWNotifier

Improve HTML Parsing

Closed this issue · 0 comments

Description

Currently we just strip down the HTML received from Discourse (except tags) and send the telegram message with no formatting. We also directly put the image links into the message body, and since they are usually quite long links they take valuable character space and look unpleasant. Discourse sends emojis as images also, this causes the bot to convert all the emojis into links and the telegram's link preview feature results in unwanted behavior.

To solve these issues, we can use the telegram's limited html tag support. Most of the tags can be mapped to the tags which telegram supports (e.g. <h1> to <b>). Telegram also support tags, these can be used for links (including images). Telegram supports emojis natively so we just need to replace the entire emoji tag with appropriate unicode bytes.

Things To Do

  • Formatting with Telegram-supported tags
  • Link Handling
  • Emoji Handling

Related Files

newsparser.py