ankane/pdscan

Ideas

ankane opened this issue · 4 comments

Ideas

Could you add options to filter out certain types of PII or certain quantities of PII? For example if I don't care about any files that have email addresses and I don't care about any files that have less than 10 phone numbers.

more a few ideas:

  • Dockerized version
  • Override defaults using environment variables (eg.: custom list of surnames, non-us patterns for phones)
  • Whitelist certain values (eg.: credit card 0123-4567-8910)
  • It would be useful to have the line and column numbers for each PII found. e.g. credit card number 1234-5768-9876-4321 appears on line 12 column 25 in the file /home/foo/bar.txt.

  • Can you add support for user-defined regex rules? e.g. a regex rule to find credit card numbers saved in files without the hyphens. i.e. '\d{16}'

Hey all, thanks for all the suggestions (and sorry for the delay)!

@EricSeiffert There are now --only and --except options you can use to select which rules to run. There's also an experimental --min-count option to specify the minimum number of rows/documents/lines for a match (docs).

@brennoo There's now a Docker image. There's also support for E.164 phone numbers out of the box. I've added the other ideas to #12.

@manivannanpk There's now a --pattern option to scan for a custom pattern (docs). I've added the idea of showing match locations to #12.

I think it's easier to discuss individual ideas in separate issues, so please create a new issue if you'd like to discuss anything in more detail.