ByeByePii is a Python package that is meant for hashing personal identifiable information (PII). It was built focused on making Data Lakes storing JSON files GDPR compliant.
- Analyzing Python Dictionaries in order to identify PII
- Hashing PII in a given Python Dictionary
The source code is currently hosted on GitHub at: https://github.com/falkzeh/ByeByePii
Binary installers for the latest released version are available at the Python Package Index (PyPI).
pip install ByeByePii
In order to not having to manually look for all the keys in a Python Dictionary, we can use the analyzeDict
function.
import byebyepii
import json
if __name__ == '__main__':
# Loading local JSON file
with open('data.json') as json_file:
data = json.load(json_file)
# Analyzing the dictionary and creating our hash list
key_list, subkey_list = byebyepii.analyzeDict(data)
$ python3 analyzeDict.py
Add BuyerInfo - BuyerEmail to hash list? (y/n) y
Add SalesChannel to hash list? (y/n) n
Add OrderStatus to hash list? (y/n) n
Add PurchaseDate to hash list? (y/n) n
Add ShippingAddress - StateOrRegion to hash list? (y/n) y
Add ShippingAddress - PostalCode to hash list? (y/n) y
Add ShippingAddress - City to hash list? (y/n) n
Add ShippingAddress - CountryCode to hash list? (y/n) n
Add LastUpdateDate to hash list? (y/n) n
Keys to hash: ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
Subkeys to hash: ['BuyerEmail', 'StateOrRegion', 'PostalCode']
Using the key lists we just created we can proceed to hash the PII in the dictionary.
import byebyepii
import json
if __name__ == '__main__':
# Loading local JSON file
with open('data.json') as json_file:
data = json.load(json_file)
# Hasing the PII
keys_to_hash = ['BuyerInfo', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress', 'ShippingAddress']
subkeys_to_hash = ['BuyerEmail', 'StateOrRegion', 'PostalCode']
hashed_pii = byebyepii.hashPii(data, keys_to_hash, subkeys_to_hash)
# Writing the updated JSON file
with open('hashed_data.json', 'w') as outfile:
json.dump(hashed_pii, outfile)
Before:
{
"BuyerInfo": {
"BuyerEmail": "test@test.com"
},
"EarliestShipDate": "2022-01-01T23:59:59Z",
"SalesChannel": "Website",
"OrderStatus": "Shipped",
"PurchaseDate": "2022-01-01T23:59:59Z",
"ShippingAddress": {
"StateOrRegion": "West Midlands",
"PostalCode": "DY9 0TH",
"City": "STOURBRIDGE",
"CountryCode": "GB"
},
"LastUpdateDate": "2022-01-01T23:59:59Z",
}
After:
{
"BuyerInfo": {
"BuyerEmail": "037a51cb9162f51772eaf6b0fb02e1b5d0bf8219deacf723eeedc162209bfd33"
},
"EarliestShipDate": "2022-01-01T23:59:59Z",
"SalesChannel": "Website",
"OrderStatus": "Shipped",
"PurchaseDate": "2022-01-01T23:59:59Z",
"ShippingAddress": {
"StateOrRegion": "08fa57d00de1936ebea7aeaf8e36d04510a5d885cfaa4f169c2b010d36ccaca4",
"PostalCode": "714f02c01e20988ee273776dc218f44326c2f5839618b0c117413b0cc7d91701",
"City": "STOURBRIDGE",
"CountryCode": "GB"
},
"LastUpdateDate": "2022-01-01T23:59:59Z",
}