lk-geimfari/mimesis

Remove inappropriate words from your random text selections

ek-nyc opened this issue · 3 comments

Feature request

Go through all your data sets and remove inappropriate words.

Thesis

For example, text.json contains words like 'milf' and 'milfhunter'. Those need to be removed because customers end up seeing this in their sample data sets and this doesn't make anyone look good for anyone.

Reasoning

If you want companies using your tool, you need to cleanse the data.

I completely agree with this. The problem is that this data was collected all over the internet and not by me alone, and obviously I haven't seen all the data and verified it. It's also worth noting that this kind of data got there by accident.

I take this problem seriously. Fixing it will be a top priority for the next release.

Well, I removed everything I found using: https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/tree/master

I hope this will improve the quality of the datasets and there won't be bad words in them, but I can't guarantee it because I can't check all the datasets, word by word. Can't do it physically.

Version 16.0.0 with fixes is available now.