Chat sanitization refactor
Opened this issue · 8 comments
This issue has had some scope/intention changes since it was created. Please ping me on discord to talk to me about what admins want/need from it if you're interested in working on it. It's still mostly the same but I'd like to make sure that the intended use cases are going to be covered.
Description
The entire system could use a refactor to work with regex and have prototypes or something define the replacements instead of hardcoding them. It probably can/should also be client side instead of server side. If the client does sanitization before sending each message, the server doesn't have to worry about running sanitizations on each message.
Sanitization should be able to match based on any combination of:
- Regex match
- Chat type (speak/whisper/emote)
- Chat channel (OOC/LOOC/Deadchat/IC)
Ideally, it'd support negative matches, (example: not a whisper) but that isn't super important. There should probably also be some way to order the substitution priority.
Sanitizations should be able to perform any combination of:
- Replace regex match
- Remove regex match
- Block message
- Send custom message to client
- Send custom message to server (like making the client emote)
- Play audio (accessibility and "visibility" feature)
Examples
lol
Conditions:
- Regex match
\blol\b
- speak/whisper/emote
- IC
Actions: - Remove regex match
- Send custom message to server: emote laughs
tbh
Conditions:
- Regex match
\btbh\b
- speak/whisper/emote
- IC
Actions: - Replace regex match with
to be honest
tbh for people too afraid to replace it
Conditions:
- Regex match
\btbh\b
- speak/whisper/emote
- IC
Actions: - Send custom message to client: Please avoid text speak, use "to be honest" instead of "tbh"
Double spaces
Fixes double spaces caused by removals, should happen last
Conditions:
- Regex match
{2,}
- speak/whisper/emote
- LOOC/OOC/IC/Deadchat
Actions: - Remove regex match
Is it really a good idea to move this to the client? If the client is hacked then they can bypass the sanitization from how I'm imagining this
Is it really a good idea to move this to the client? If the client is hacked then they can bypass the sanitization from how I'm imagining this
Originally it was intended to be used for stuff like correcting text speak. Since we want to use it for slurs now, it should have the ability to be server side, but having the option to make some of them client side could still be good because all the text speak stuff can be client side and it'll prevent the server from having to match the messages.
The steps I take when searching logs for slurs and other unacceptable words/sentences are as follow
- Regex match slurs and whatnot
- Look up the notes of whoever typed slurs and whatnot
- If no actions were taken against their behavior, search Chat weblogs using the date the issue happened to find the specific round in which the issue occured
- Read the 5 previous and 5 following messages
- Find out from context whether the issue is bannable or not
- Ban or send a message note depending on findings
For #24680: I'm religiously avoiding any form of regex for the chat sanitization work for two reasons:
- The number of people who understand how regex works is far smaller than the number of people capable of copy-and-pasting regex off of Stack Overflow into their project. "If you solve technical debt with regex, you create technical debt".
- Regex is insufficient for the needed work. For example, there's a responsibility to create a buildable sanitization job (via local events) that does different things, with the job being able to be modified easily via cvars. Dense regex makes this harder than it needs to be.
@FairlySadPanda The number of people who understand something is always far smaller than the number of people capable of copy/pasting it off of stack overflow. I think having basic text matching and highly encouraging people to use that over regex is fine, but I also think regex needs to be supported to enable filtering that isn't possible with basic text matching.
I don't think I'm familiar with any sort of half decent chat/text filtering system that doesn't support regex. In case it's unclear, I'm not saying that the filtering system should be based on regex substitutions, just that matching with regex should be supported
OK: I think the best compromise here is allow regex expressions to be specified as a filter step, but having a robust (ahem) filtration system otherwise.
Regex is just one of those things that tends to creep in as an innocent change. See also SQL 😉
Ya, the way I imagine it being used is as a condition for filtering
When something like
If SimpleTextMatch Then DoSomething
isn't good enough,
If RegexMatch Then DoSomething
can be used. DoSomething
doesn't need to involve any regex in it. As nice as it'd be for it to support having capture groups passed to it, all it needs at a minimum is the full match (or match position), similar to how the simplematch would make DoSomething
aware of where the match is so that, if replacement or something needs to happen, it knows where to do it. I think other than replacement, none of the possible filter actions need to know what was matched, just what the filter rule wants done
I've closed #24680 and will put a PR in for specifically text filtration that'll have regex as a specific option