Handle links where url is hidden in html
yuletide opened this issue · 1 comments
yuletide commented
Example: https://mastodon.social/@spiegelsche@f.3ischn.de/109582649657814621
022-12-27T00:11:01Z app[c2c6b821] sjc [info]===== found mention in reply to yuletide id 109391862882784405 =====
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]{'id': 109582736490955865, 'created_at': datetime.datetime(2022, 12, 27, 0, 11, tzinfo=tzlocal()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://mastodon.social/users/yuletide/statuses/109582736425325992', 'url': 'https://mastodon.social/@yuletide/109582736425325992', 'replies_count': 0, 'reblogs_count': 0, 'favourites_count': 0, 'edited_at': None, 'favourited': False, 'reblogged': False, 'muted': False, 'bookmarked': False, 'content': '<p><span class="h-card"><a href="https://botsin.space/@nitterbot" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>nitterbot</span></a></span> can you help with this link</p>', 'filtered': [], 'reblog': None, 'account': {'id': 109391862882784405, 'username': 'yuletide', 'acct': 'yuletide@mastodon.social', 'display_name': 'alex yuletide', 'locked': False, 'bot': False, 'discoverable': True, 'group': False, 'created_at': datetime.datetime(2022, 6, 21, 0, 0, tzinfo=tzlocal()), 'note': '<p>Spatial solutions arch & web dev, social justice, civic tech, heavy metal. Available for work! \u2029Proud parent to <span class="h-card"><a href="https://botsin.space/@nitterbot" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>nitterbot</span></a></span>\u2028\u2029\u2029Past: Mapbox Solutions Architect & Tech Lead @ Community Team, <span class="h-card"><a href="https://mastodon.social/@recursecenter" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>recursecenter</span></a></span> fellow, founder of civic tech startup now part of @granicus, @codeforamerica fellow, @esri\u2029\u2028\u2029<a href="https://mastodon.social/tags/vegetarian" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vegetarian</span></a> <a href="https://mastodon.social/tags/zen" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>zen</span></a> <a href="https://mastodon.social/tags/metal" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>metal</span></a> <a href="https://mastodon.social/tags/bassmusic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bassmusic</span></a> <a href="https://mastodon.social/tags/dj" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>dj</span></a> <a href="https://mastodon.social/tags/maps" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>maps</span></a> <a href="https://mastodon.social/tags/photography" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>photography</span></a> <a href="https://mastodon.social/tags/webdev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webdev</span></a> <a href="https://mastodon.social/tags/politics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>politics</span></a></p>', 'url': 'https://mastodon.social/@yuletide', 'avatar': 'https://files.botsin.space/cache/accounts/avatars/109/391/862/882/784/405/original/0efc492b3538e902.png', 'avatar_static': 'https://files.botsin.space/cache/accounts/avatars/109/391/862/882/784/405/original/0efc492b3538e902.png', 'header': 'https://files.botsin.space/cache/accounts/headers/109/391/862/882/784/405/original/1f2a8c1cc92143b4.png', 'header_static': 'https://files.botsin.space/cache/accounts/headers/109/391/862/882/784/405/original/1f2a8c1cc92143b4.png', 'followers_count': 150, 'following_count': 88, 'statuses_count': 171, 'last_status_at': datetime.datetime(2022, 12, 27, 0, 0), 'emojis': [], 'fields': [{'name': 'Birdsite', 'value': '<a href="HTTPS://twitter.com/yuletide" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible"></span><span class="">HTTPS://twitter.com/yuletide</span><span class="invisible"></span></a>', 'verified_at': None}, {'name': 'LinkedSite', 'value': '<a href="https://linkedin.com/in/alexyule" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="">linkedin.com/in/alexyule</span><span class="invisible"></span></a>', 'verified_at': None}]}, 'media_attachments': [], 'mentions': [{'id': 109543657746642932, 'username': 'nitterbot', 'url': 'https://botsin.space/@nitterbot', 'acct': 'nitterbot'}], 'tags': [], 'emojis': [], 'card': None, 'poll': None}
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]filtered status @nitterbot can you help with this link
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]no birdsite found, checking parent
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]checking parent
Current behavior: We use HTMLParser to strip out all HTML, but turns out statuses can be rich formatted which explains why this exists in the first place. Some have funky formatting so there will be some weird edge cases likely if we leave the HTML in...
Proposed behavior: Just replace all twitter.com with nitter instance, in both text and html and see what happens
yuletide commented
Another one https://botsin.space/@RobertMaguire@journa.host/109649227056480533
Seems to be a product of some crosspost bots, or this failed for some other reason