datafaker-net/datafaker

Allow an easy way to create random PII/PHI data

erikpragt-connectid opened this issue · 3 comments

In a project I'm working on, I'm working with AWS Comprehend to detect and redact PII data.

It would be nice if Datafaker could generate some explicit data including PII data, such as birth dates, full names, credit card data, medical records, etc.

Perhaps something like:

faker.pii().medical()
faker.pii().banking()
faker.pii().general()

or

faker.medical().phi()
faker.banking().pii()

Not sure yet, but the output should be something like:

Hello Zhang Wei, I am John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000.

Customer feedback for Sunshine Spa, 123 Main St, Anywhere. Send comments to Alice at sunspa@mail.com. I enjoyed visiting the spa. It was very comfortable but it was also very expensive. The amenities were ok but the service made the spa a great experience.

(this is the AWS sample message).

I'll most likely build the above, but suggestions / feedback welcome.

how about the simplest way: put several suitable templates into yaml conf with expressions which during generation will be substituted with generated data?

Yes, I think something like that was going to be the approach. We have all the components to do generate the text already, just wasn't sure where to put it. And maybe it's not a good idea, maybe a custom faker is a much better solution anyway, since I think it's better if Datafaker provides the atomic elements, and tools around it focus on the composition of these fields, eg for the scenario I need it for. Thoughts?

yes, probably custom faker suits better here since the templates are a subject to be frequently changed. At the same time it might be useful to have something like that in doc examples