auto anonymisation by regex or dictionary
Opened this issue · 3 comments
It would awesome to have a auto / multi-column anonymisation feature, using regexp where possible (IBAN, card numbers, email, national personal identification code) and dictionaries [1] for names / geo names, per country.
In a lot of cases (some bank statements) variables/attributes are not stand-alone, but bundled in one cell.
[1] name-dataset, forebears.io, firstname-database, topics/surnames.
you mean that you do not have columns in csv or json/xml properties, but have a mix of name/email/iban in one column and still want to anonymize the data?
The library used to anonymize can handle country specific anonymization, just use the --locale parameter. Then the names/addresses/etc. will be country specific.
Hello Alex,
I added the feature that allows to match text in csv cells using regular expressions. See readme.md for details.
I hope this resolves your issues!
Have fun....
Christof