Chat sanitization refactor

Question

Chat sanitization refactor

Opened this issue 2 years ago · 8 comments

This issue has had some scope/intention changes since it was created. Please ping me on discord to talk to me about what admins want/need from it if you're interested in working on it. It's still mostly the same but I'd like to make sure that the intended use cases are going to be covered.

Description

The entire system could use a refactor to work with regex and have prototypes or something define the replacements instead of hardcoding them. It probably can/should also be client side instead of server side. If the client does sanitization before sending each message, the server doesn't have to worry about running sanitizations on each message.

Sanitization should be able to match based on any combination of:

Regex match
Chat type (speak/whisper/emote)
Chat channel (OOC/LOOC/Deadchat/IC)

Ideally, it'd support negative matches, (example: not a whisper) but that isn't super important. There should probably also be some way to order the substitution priority.

Sanitizations should be able to perform any combination of:

Replace regex match
Remove regex match
Block message
Send custom message to client
Send custom message to server (like making the client emote)
Play audio (accessibility and "visibility" feature)

Examples

lol

Conditions:

Regex match \blol\b
speak/whisper/emote
IC
Actions:
Remove regex match
Send custom message to server: emote laughs

tbh

Conditions:

Regex match \btbh\b
speak/whisper/emote
IC
Actions:
Replace regex match with to be honest

tbh for people too afraid to replace it

Conditions:

Regex match \btbh\b
speak/whisper/emote
IC
Actions:
Send custom message to client: Please avoid text speak, use "to be honest" instead of "tbh"

Double spaces

Fixes double spaces caused by removals, should happen last
Conditions:

Regex match {2,}
speak/whisper/emote
LOOC/OOC/IC/Deadchat
Actions:
Remove regex match

Answer 1 · 2023-12-30T23:59:06.000Z

Is it really a good idea to move this to the client? If the client is hacked then they can bypass the sanitization from how I'm imagining this

Answer 2 · 2023-12-31T06:37:47.000Z

Is it really a good idea to move this to the client? If the client is hacked then they can bypass the sanitization from how I'm imagining this

Originally it was intended to be used for stuff like correcting text speak. Since we want to use it for slurs now, it should have the ability to be server side, but having the option to make some of them client side could still be good because all the text speak stuff can be client side and it'll prevent the server from having to match the messages.

Answer 3 · 2024-01-26T07:40:45.000Z

The steps I take when searching logs for slurs and other unacceptable words/sentences are as follow

Regex match slurs and whatnot
Look up the notes of whoever typed slurs and whatnot
If no actions were taken against their behavior, search Chat weblogs using the date the issue happened to find the specific round in which the issue occured
Read the 5 previous and 5 following messages
Find out from context whether the issue is bannable or not
Ban or send a message note depending on findings

Answer 4 · 2024-02-19T15:31:50.000Z

For #24680: I'm religiously avoiding any form of regex for the chat sanitization work for two reasons:

The number of people who understand how regex works is far smaller than the number of people capable of copy-and-pasting regex off of Stack Overflow into their project. "If you solve technical debt with regex, you create technical debt".
Regex is insufficient for the needed work. For example, there's a responsibility to create a buildable sanitization job (via local events) that does different things, with the job being able to be modified easily via cvars. Dense regex makes this harder than it needs to be.

Answer 5 · 2024-03-04T12:38:27.000Z

@FairlySadPanda The number of people who understand something is always far smaller than the number of people capable of copy/pasting it off of stack overflow. I think having basic text matching and highly encouraging people to use that over regex is fine, but I also think regex needs to be supported to enable filtering that isn't possible with basic text matching.

I don't think I'm familiar with any sort of half decent chat/text filtering system that doesn't support regex. In case it's unclear, I'm not saying that the filtering system should be based on regex substitutions, just that matching with regex should be supported

Answer 6 · 2024-03-04T12:42:13.000Z

OK: I think the best compromise here is allow regex expressions to be specified as a filter step, but having a robust (ahem) filtration system otherwise.

Regex is just one of those things that tends to creep in as an innocent change. See also SQL 😉

Answer 7 · 2024-03-04T17:28:57.000Z

Ya, the way I imagine it being used is as a condition for filtering

When something like
If SimpleTextMatch Then DoSomething
isn't good enough,
If RegexMatch Then DoSomething
can be used. DoSomething doesn't need to involve any regex in it. As nice as it'd be for it to support having capture groups passed to it, all it needs at a minimum is the full match (or match position), similar to how the simplematch would make DoSomething aware of where the match is so that, if replacement or something needs to happen, it knows where to do it. I think other than replacement, none of the possible filter actions need to know what was matched, just what the filter rule wants done

Answer 8 · 2024-03-04T18:00:51.000Z

I've closed #24680 and will put a PR in for specifically text filtration that'll have regex as a specific option