cdaller/multi_anonymizer

auto anonymisation by regex or dictionary

Opened this issue · 3 comments

pax commented

It would awesome to have a auto / multi-column anonymisation feature, using regexp where possible (IBAN, card numbers, email, national personal identification code) and dictionaries [1] for names / geo names, per country.

In a lot of cases (some bank statements) variables/attributes are not stand-alone, but bundled in one cell.

[1] name-dataset, forebears.io, firstname-database, topics/surnames.

you mean that you do not have columns in csv or json/xml properties, but have a mix of name/email/iban in one column and still want to anonymize the data?

The library used to anonymize can handle country specific anonymization, just use the --locale parameter. Then the names/addresses/etc. will be country specific.

pax commented

have a mix of name/email/iban in one column

yes, annoyingly so

also, I would imagine other cases of columns with long text content that might contain strings that need anonymisation

Screenshot 2024-03-28 at 15 08 01

Hello Alex,

I added the feature that allows to match text in csv cells using regular expressions. See readme.md for details.
I hope this resolves your issues!

Have fun....
Christof