Blazingly fast cleaning swear words (and their leetspeak) in strings
Currently there is a performance issue with the latest version (0.7.0). It is recommended to use the last stable version 0.6.1.
Inspired from package profanity of Ben Friedland, this library is significantly faster than the original one, by using string comparison instead of regex.
It supports modified spellings (such as p0rn
, h4NDjob
, handj0b
and b*tCh
).
This package works with Python 3.5+
and PyPy3
.
pip3 install better_profanity
Only Unicode characters from categories Ll
, Lu
, Mc
and Mn
are added. More on Unicode categories can be found here.
Not all languages are supported yet, such as Chinese.
from better_profanity import profanity
if __name__ == "__main__":
profanity.load_censor_words()
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text)
print(censored_text)
# You **** of ****.
All modified spellings of words in profanity_wordlist.txt will be generated. For example, the word handjob
would be loaded into:
'handjob', 'handj*b', 'handj0b', 'handj@b', 'h@ndjob', 'h@ndj*b', 'h@ndj0b', 'h@ndj@b',
'h*ndjob', 'h*ndj*b', 'h*ndj0b', 'h*ndj@b', 'h4ndjob', 'h4ndj*b', 'h4ndj0b', 'h4ndj@b'
The full mapping of the library can be found in profanity.py.
By default, profanity
replaces each swear words with 4 asterisks ****
.
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text)
print(censored_text)
# You **** of ****.
The function .censor()
also hide words separated not just by an empty space
but also other dividers, such as _
, ,
and .
. Except for @, $, *, ", '
.
from better_profanity import profanity
if __name__ == "__main__":
text = "...sh1t...hello_cat_fuck,,,,123"
censored_text = profanity.censor(text)
print(censored_text)
# "...****...hello_cat_****,,,,123"
4 instances of the character in second parameter in .censor()
will be used to replace the swear words.
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text, '-')
print(censored_text)
# You ---- of ----.
Function .contains_profanity()
return True
if any words in the given string has a word existing in the wordlist.
from better_profanity import profanity
if __name__ == "__main__":
dirty_text = "That l3sbi4n did a very good H4ndjob."
profanity.contains_profanity(dirty_text)
# True
Function load_censor_words
takes a List
of strings as censored words.
The provided list will replace the default wordlist.
from better_profanity import profanity
if __name__ == "__main__":
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords)
print(profanity.contains_profanity("Have a merry day! :)"))
# Have a **** day! :)
Function `load_censor_words_from_file takes a filename, which is a text file and each word is separated by lines.
from better_profanity import profanity
if __name__ == "__main__":
profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt')
Function load_censor_words
and load_censor_words_from_file
takes a keyword argument whitelist_words
to ignore words in a wordlist.
It is best used when there are only a few words that you would like to ignore in the wordlist.
# Use the default wordlist
profanity.load_censor_words(whitelist_words=['happy', 'merry'])
# or with your custom words as a List
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords, whitelist_words=['merry'])
# or with your custom words as a text file
profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt', whitelist_words=['merry'])
from better_profanity import profanity
if __name__ == "__main__":
custom_badwords = ['happy', 'jolly', 'merry']
profanity.add_censor_words(custom_badwords)
print(profanity.contains_profanity("Happy you, fuck!"))
# **** you, ****!
- As the library compares each word by characters, the censor could easily be bypassed by adding any character(s) to the word:
profanity.censor('I just have sexx')
# returns 'I just have sexx'
profanity.censor('jerkk off')
# returns 'jerkk off'
- Any word in wordlist that have non-space separators cannot be recognised, such as
s & m
, and therefore, it won't be filtered out. This problem was raised in #5.
python3 tests.py
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Andrew Grinevich - Add support for Unicode characters.
- Jaclyn Brockschmidt - Optimize string comparison.
- Ben Friedland - For the inspiring package profanity.