GoogleCloudPlatform/terraform-google-secured-data-warehouse

Add support to providing inspect configurations in the python dataflow template

renato-rudnicki opened this issue ยท 1 comments

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave +1 or me too comments; they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If the issue is assigned to a user, that user is claiming responsibility for the issue.

Description

Cloud Data Loss Prevention (DLP) inspection finds potentially sensitive information in content. This sensitive information found can be a candidate to be de-identify or re-identify by DLP.

To de-identify/re-identify (tabular) data, we need to create a template where you will need to bind a field to a transformation.
During the de-identify/re-identify, we can transform the whole content of the field or only a specific part of the field (this last is the case where we need to use the inspect configuration for the infoType specified).
For de-identification, the DLP API will scan the content of the field in the record looking for the infoType specified. In the case that the infoType is found, the text that matches the infoType specified will be encrypted.

For example, if we are using a medical term where you would like to encrypt the name of some disease, then weโ€™ll need to use the inspect config feature to find it and surrogate that data.

Code Date Message
1123 11/02/2021 Alice Jones was diagnosed with the MEDICAL_TERM_SURROGATE(3):CRYPTO_VALUE.

For the re-identification it is necessary to pass a custom infotype of type surrogateType that was specified in the de-identification, like MEDICAL_TERM_SURROGATE, through an inspection Configuration, so the DLP API can find the infoType and decrypt it.

For example, in the previous example the sensitive data flu was encrypted. So with the inspect config on the terraform code below, the DLP API will be able to find the surrogate specified.

"inspectConfig": {
  "customInfoTypes": [
    {
      "infoType": {
        "name": "MEDICAL_TERM_SURROGATE"
      },
      "surrogateType": {}
    }
  ]
},

The transformation will be:

Code Date Message
1123 11/02/2021 Alice Jones was diagnosed with the flu.

I would like to request that support for providing a specification config to be added to the provided Data Flow flex template, to be able to cover the usage of the inspection config described on this feature request.

References

https://cloud.google.com/dlp/docs/infotypes-reference
https://cloud.google.com/dlp/docs/transformations-reference

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days