auto anonymisation by regex or dictionary

Question

auto anonymisation by regex or dictionary

Opened this issue 8 months ago · 3 comments

It would awesome to have a auto / multi-column anonymisation feature, using regexp where possible (IBAN, card numbers, email, national personal identification code) and dictionaries [1] for names / geo names, per country.

In a lot of cases (some bank statements) variables/attributes are not stand-alone, but bundled in one cell.

[1] name-dataset, forebears.io, firstname-database, topics/surnames.

Answer 1 · 2024-03-27T13:07:49.000Z

you mean that you do not have columns in csv or json/xml properties, but have a mix of name/email/iban in one column and still want to anonymize the data?

The library used to anonymize can handle country specific anonymization, just use the --locale parameter. Then the names/addresses/etc. will be country specific.

Answer 2 · 2024-03-28T13:12:18.000Z

have a mix of name/email/iban in one column

yes, annoyingly so

also, I would imagine other cases of columns with long text content that might contain strings that need anonymisation

Answer 3 · 2024-03-28T15:45:35.000Z

Hello Alex,

I added the feature that allows to match text in csv cells using regular expressions. See readme.md for details.
I hope this resolves your issues!

Have fun....
Christof