matrix-org/matrix.org

fix TWIM-bot(?) eating links

Opened this issue ยท 7 comments

might be limited to matrix.to links.

image
image
image

Considering the nheko post was done on probably nheko (@deepbluev7 please confirm) https://github.com/haecker-felix/hebbot/blob/b230bd3749a90a9f6bb642536074154f3658a92a/src/render.rs#L347-L353 has also a bug since it missed 2 things.

On further investigation its because the first regex checks for start of line. Thats not given due to the braces. the second one checks for a space before. Thats also not given. @haecker-felix Is the space on the second regex a typo or a bug? ๐Ÿค” I can make a PR if needed I think

Do I read it correctly, that that regex is checking for room matrix.to links starting with a #? That only works for aliases sent by clients not following the matrix spec, since the spec suggests identifiers should be escaped, which means links start with https://matrix.to/#/%23, not https://matrix.to/#/# (and the url RFC also says there should be no unescaped # in urls).

EDIT: Seems like it works on the body of the message, which usually doesn't have a matrix.to link at all, so it probably isn't trying to match against a matrix.to link but instead against any alias.

and the given regex can also not work with aliases in brackets e.g. (#matrix-spec:matrix.org) as seen in OP

On further investigation its because the first regex checks for start of line. Thats not given due to the braces. the second one checks for a space before. Thats also not given. @haecker-felix Is the space on the second regex a typo or a bug? ๐Ÿค” I can make a PR if needed I think

I can't remember anymore how / why I did the regex in this way, too long ago. If it causes problems, please feel free to open a PR.

hm but what causes the user (@) pills to get lost? ๐Ÿค”