finnbear/rustrict

False positive cases

vtvz opened this issue · 8 comments

vtvz commented

I use this lib to check profanity in songs' lyrics. I really thankful about "glass" doesn't have "ass" in :) But I have some false positive cases I want to share:

  • I could say I miss you but it’s not the truth (severe sexual)
  • But this time only the hurt inside is what is real (mild profane)
  • You said your mother only smiled on her TV show (mild sexual mean)
  • I'm holding on to your love but they won't stop 'til it's dead, (severe sexual)
  • I know you hate yourself but that doesn't change, (moderate profane mean)

I highlighted detected words

Hi, thanks for the comprehensive feedback!

Version 0.3.8 (or an earlier recent version) fixes false positives 1, 2, 4, and 5.

  • I implemented a much needed fix to the automatic false positive finder, by selectively disabling an optimization (reducing phrases to their prefix) that created many false positives
  • I added all these cases to test_negative.txt, so they should never happen again

In regards to case 3, I actually agree with the filter output (that sentence is mildly sexual and likely mildly mean), so the solution is for you to change your filtering threshold. For example, the following assertion passes:

// Notice "isnt" instead of "is"
// This differs from Type::INAPPROPRIATE, which includes Type::SEXUAL & Type::MILD
assert!("You said your mother only smiled on her TV show".isnt(
            Type::PROFANE
                | Type::OFFENSIVE
                | Type::SEXUAL & Type::MODERATE_OR_HIGHER
                | Type::MEAN & Type::SEVERE
        ));

(You can also pass this to is, Censor::with_xxx_threshold, etc.)

In the future, I might add multiple versions of Type::INAPPROPRIATE depending on the use case (the current version is good for my use case, online game chat, but not song lyrics).

I'll close the issue in a few days unless you have any other problems to share.

@vtvz

By the way, I took a look at your rustify project and noticed you are heavily relying on Type::SAFE. This is likely to be counterproductive, since Type::SAFE follows a very strict definition that the entire phrase must match something on this list, which almost no song lyrics will match (although I am willing to expand the list with common, 100% safe phrases). This is intended for restricting the chat of users who are deemed likely to evade the filter. Instead of using Type::SAFE, you should probably test for Type::PROFANE | Type::OFFENSIVE | Type::SEXUAL & Type::MODERATE_OR_HIGHER | Type::MEAN & Type::SEVERE (or something that meets your specific needs) and then negate the result.

To give an example: "This is a completely safe phrase" would not be Type::SAFE, because it's not on the list, but would also not be Type::PROFANE | Type::OFFENSIVE | Type::SEXUAL & Type::MODERATE_OR_HIGHER | Type::MEAN & Type::SEVERE.

Also, for completeness, I will note that the following loop will only work for a subset of lyrics (as censored.chars().len() < line.chars().len() if line contains diacritics or banned characters such as right-to-left markers).

                let bad_chars = line
                    .chars()
                    .into_iter()
                    .enumerate()
                    .filter_map(|(i, c)| {
                        if !censored.chars().nth(i).contains(&c) {
                            Some(i)
                        } else {
                            None
                        }
                    })
                    .collect::<Vec<_>>();

https://github.com/vtvz/rustify/blob/298ceff525ffee0dc7cd17bddd47ff28da01c0eb/src/profanity.rs#L31-L57

vtvz commented

Wow. Thanks a lot for your reply. TBH I didn't expect that anyone will review my shitty code. I'm pretty much a noob in Rust and it's my first close to the real-world application which does something useful.

I got your point about SAFE and I'll rewrite it to use the exact bit-mask. But what can I do to not just hide profane words, but highlight them? In past, I used censor crate which can give me indices of bad chars, but it's not so great due to lots of false-positive cases with "ass" word (glass, bass, pass etc). And I switched to this crate and it's a lot (I mean A LOT) better. I wish to give help with developing, but as I said I'm a newbie.

BTW, some features I really miss in this lib are:

  • Non-global customization. Ability to have local dictionaries to give users abilities to add their words and phrases.
  • A way to remove words from a dictionary or make them safe. If I try to make "hell" safe I get this error. Maybe do something like this:
if typ.is(Type::SAFE) {
  current.typ = typ;
} else {
  current.typ |= typ;
}

And I have a question: what should I do with false-positive cases in future? Should I open a new issue? Or create PR with changes in test_negative.txt file?

vtvz commented

I see a lot of triggers on "hate" as "moderate mean" and "hell" as "mild profane". In most cases, these words don't really have a bad meaning, but I have no way to remove them from the dictionary. I think in a gaming context it makes sense, but in songs' lyrics, it's not the case. I really wish to have more control over dictionary.

vtvz commented

btw "I could say I miss you but it’s not the truth" is still "moderate sexual"

But what can I do to not just hide profane words, but highlight them?

I've done a lot of thinking (since the very beginning of this library), and I come to the conclusion that the filter cannot simultaneously protect against diacritics (accents) and maintain accurate indices into the input without duplicating a ton of code from unicode_normalization. That's because the unicode processing pipeline is three nested iterators that may consume extra input before yielding characters, scrambling the indices that result. That means that, in order to give accurate indices, the part that removes/protects against diacritics (and other banned characters) would have to be removed, or disabled.

In past, I used censor crate which can give me indices of bad chars, but it's not so great due to lots of false-positive cases with "ass" word (glass, bass, pass etc). And I switched to this crate and it's a lot (I mean A LOT) better.

Yay!

BTW, some features I really miss in this lib are:

* Non-global customization. Ability to have local dictionaries to give users abilities to add their words and phrases.

Adding words at runtime is problematic, since the false positive finder wont have a chance to run. The false positive finder is what makes this filter better than others. Without it, if you added ass at runtime, it would be detected within glass, unless you manually added glass as a false positive.

Also, I'm concerned about runtime overhead. Do you have any suggestion for adding local dictionaries without lots of runtime overhead?

* A way to remove words from a dictionary or make them safe. If I try to make "hell" safe I get [this error](https://github.com/finnbear/rustrict/blob/0.3.8/src/radix.rs#L38). Maybe do something like this:
if typ.is(Type::SAFE) {
  current.typ = typ;
} else {
  current.typ |= typ;
}

I never intended Type::SAFE to be anything other than "this phrase is 100%, undeniably safe." The proper way of "clearing" a profanity would be setting it to Type::NONE. However, the current code wouldn't let you do that, so I have updated the crate such that add_word will overwrite the type, instead of bitwise-or with it.

With 0.3.9, this passes:

        let test_profanity = "thisisafakeprofanityfortesting";

        // SAFETY: Tests are run serially, so concurrent mutation is avoided.
        unsafe {
            add_word(test_profanity, Type::PROFANE & Type::SEVERE);
        }

        assert!(test_profanity.is(Type::PROFANE & Type::SEVERE));

        unsafe {
            // What you can do to remove a word entirely.
            add_word(test_profanity, Type::NONE);
        }

        assert!(test_profanity.isnt(Type::PROFANE));

I see a lot of triggers on "hate" as "moderate mean" and "hell" as "mild profane". In most cases, these words don't really have a bad meaning, but I have no way to remove them from the dictionary. I think in a gaming context it makes sense, but in songs' lyrics, it's not the case. I really wish to have more control over dictionary.

Keep in mind you can stop filtering out Type::MEAN & Type::MODERATE and Type::PROFANE & Type::MILD. Or you can remove them from the dictionary, via the above.

And I have a question: what should I do with false-positive cases in future? Should I open a new issue? Or create PR with changes in test_negative.txt file?

Dumping them in issues is fine too. You don't have to tell me exactly what part of the phrase was detected, or anything fancy. Just issue title "Some false positive" and issue text "[insert false positives here, one on each line]"

I will make a github issue template for this (soon). I have made a github issue template called "Filtering Error"

btw "I could say I miss you but it’s not the truth" is still "moderate sexual"

I can't reproduce with the latest version (and as I said, it's in test_negative.txt). Are you manipulating the dictionary in any way before you test it?

    // This test passes:
    #[test]
    #[serial]
    fn issue_1() {
        // https://github.com/finnbear/rustrict/issues/1#issuecomment-1024426326
        assert!("I could say I miss you but it’s not the truth".isnt(Type::ANY));
    }
vtvz commented

I think I found the reason why it triggers. It has \u{2005} character in between instead of regular space:

        let (censored, typ) = rustrict::Censor::from_str(
            "I could say I miss you\u{2005}but\u{2005}it’s not the\u{2005}truth",
        )
        .with_censor_first_character_threshold(Type::ANY)
        .with_censor_threshold(Type::ANY)
        .censor_and_analyze();

        dbg!(censored);

        println!("{:#b}", typ.bits());
[src/state.rs:106] censored = "I could say I miss you\u{2005}bu****** not the\u{2005}truth"
0b10000000

And the funny thing is that it's not an \S regex to just replace all space-like characters with simple space. So, I will do it by myself with .replace('\u{2005}', " "). Looks ugly, but it's super specific for my exact case.

What about custom dictionaries, I cannot help with this yet. I tried to understand the current implementation of Trie, but it's still black magic for me for now.

I tried the new add_word logic and it works wonderfully. Thanks a lot for your patience and attention. I really appreciate it :)

EDIT: Ok, Now I'm aware about censored.len() != original.len(), thanks. But I feel censored.chars().nth(i).contains(&c) is pretty safe to use, because if analysis result is safe or none, bad_chars vector will be empty. But if censored text will be too modified it just will outline a bit more than required.

I think I found the reason why it triggers. It has \u{2005} character in between instead of regular space:

Yeah, that would do it. False positives are invalidated when any suspicion is raised, such as with abnormal white-space characters, mainly because some abnormal whitespace characters have zero width.

And the funny thing is that it's not an \S regex to just replace all space-like characters with simple space. So, I will do it by myself with .replace('\u{2005}', " "). Looks ugly, but it's super specific for my exact case.

Yeah, that might be the best option for now. Just so you know, this crate offers is_whitespace(c: char), that could be used to check if a character is whitespace (if you end up hard-coding too many characters)

What about custom dictionaries, I cannot help with this yet.

I think it might be possible to make the censor take a custom Trie, which is the dictionary implementation. I'll make an effort to do this soon, but can't guarantee anything.

I tried to understand the current implementation of Trie, but it's still black magic for me for now.

Trie is essentially a Map<String, Type>, but it is searched one character at a time for efficiency. So the analysis/censoring algorithm is like the following pseudo-code:

matches = []
// Make a single pass through the input
for character in input:
    matches.add(trie.root)
    for m in matches:
         if let Some(next) = m.next(character):
              matches.add(next)
         if m.is_word():
              if m.is_false_positive():
                    delete_intersecting_matches()
              else
                    process_and_censor(m)

See also: https://en.wikipedia.org/wiki/Trie#/media/File:Trie_example.svg

I tried the new add_word logic and it works wonderfully. Thanks a lot for your patience and attention. I really appreciate it :)

Yay! You're very welcome! I'm just happy someone is using my crate 😄