cal-itp/data-infra

Remove sensitive Littlepay data from Pipeline

Opened this issue · 3 comments

User story / feature request

In order to comply with Caltrans security parameters, we should remove all sensitive Littlepay data from our data pipeline.

  1. We need to remove data we have already collected from raw data and any BigQuery tables that ingested the data.
  2. We need to modify the DAG task that ingests Littlepay Data so that it does store any sensitive data in our cloud system.

Acceptance Criteria

The sensitive data that needs to be removed includes:

  1. customer_id? This one is a little bit of a gray area since it is a hash and directly personally identifiable.
  2. masked_pan

Notes

This issue can be separated into 2 phases:

  1. Scope the needed changes to implement the necessary changes and share with Caltrans IT Security for review.
  2. Implement the changes.

Hi @evansiroky & @charlie-costanzo, Similar to my comment on the other issue (#3334), I wanted to let you know that we use the masked PAN field and the customer ID field too, mostly for debt management monitoring.

The first six digits of the masked PAN tells us the BIN (Bank Identification Number), which is crucial to know when payments are not successful (bad debt). We also use Customer ID and Principal Customer ID to identify how many digital payment methods a customer has (basically, we monitor if they do retokenisation fraud).

Could you please not remove these fields? PAN is masked, it is not sensitive data, we do see this kind of data in other transit agencies data warehouses and reports too. Customer ID is an identifier for each payment method (E.g. mobile, watch) and is created by LittlePay, not sure what makes it sensitive.

As of 9/10 awaiting confirmation from Caltrans Security that it is ok to keep this data since there is a legitimate business need for keeping this information.