tink-crypto/tink

BigQuery function KEYSET_CHAIN can't decrypt a Tink encrypted keyset

James-DBA-Anderson opened this issue · 2 comments

BigQuery has a function named KEYS.KEYSET_CHAIN. The first line of it's documentation is:

"Can be used in place of the keyset argument to the AEAD and deterministic encryption functions to pass a Tink keyset that is encrypted with a Cloud KMS key."

But when I pass it my bytes representation of my encrypted key, I get the following error:

"google.api_core.exceptions.BadRequest: 400 Query error: Decryption failed: verify that 'name' refers to the correct CryptoKey."

Im using Tink 1.7 with python 3.9 on a M1 Mac.

I have been able to get it to work without encrypting the DEK with a key in KMS in the way described in #373, but I can't do it once the key in encrypted.

Perhaps I am encrypting the key incorrectly or decrypting it in BigQuery in an unsupported way?

I'm using Python and the byte representation of the key does look odd
b'\x12\xbd\x01\n$\x00\xa4\xfd\x16\x1d\xb6Yk\xd2\xa5RqW\xfePQ\xd9\x9c\xf6[|\x84\xa7l\xaf\xa68\xfa\xa0\x82:\xbb(;\xb9\xc4\x12\x94\x01\x00\xd1\x11\xc8\xe1\xc4s\xb2\xe9\xfd\x13\x8c\x84\xc4\xb4\xb1\xcc\xf6v\x89\xba\x07Z\x1d\x82\x93\x14)A\x82\x94\xc0U\x0f\xd8\xe9\x94\xab:\x9c03\xea*g\x820\xceY\xb6=o\xc8-\xa3\xa8-\x15\x0bEi\x9a\x13\x00q\xf0\xb4Q\x14\xd46\xa0\x9cN\x84;9\xeb\xd9\xa3$\x91\x8bi\xc9\xc5\xaa/~\xdcC4#\xcc\xc2T\xbc\xa7\xb3\xb0\xa6\xcc7oz\xe2\xa1J\xc5\xd9\x1f\xa0\x15\x9e\x02\xfd\x97\xd04\xbdLf\xc43\xce\x1b\xf9loDc\xb4W\xe4Z\x07\x1e.)\x85\xc7\xf1\xff\xb7\xb1J\x0b\xd6\x1aD\x08\xe8\xbd\xd9\xa2\x06\x12<\n0type.googleapis.com/google.crypto.tink.AesGcmKey\x10\x01\x18\xe8\xbd\xd9\xa2\x06 \x03'

The documentation for BigQuery's AEAD functions seems to expect something like:
b'\012\044\000\107\275\360\176\264\206\332\235\215\304...'

Perhaps this is the problem?

Any help would be appreciated.

Repo

To get the script to run I had to download the roots.pem file from https://github.com/grpc/grpc/blob/master/etc/roots.pem

I placed the roots.pem file in the same directory as my script.

I then set this environment variable GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=roots.pem

import io
import tink
from tink import aead, cleartext_keyset_handle
from tink.integration import gcpkms
from google.cloud import bigquery

key_uri = 'gcp-kms://projects/YOUR_PROJECT/locations/europe-west2/keyRings/YOUR_KEYRING/cryptoKeys/YOUR_KEY'

aead.register()
gcpkms.GcpKmsClient.register_client(key_uri=key_uri, credentials_path="")

# Generate the KEK
template = aead.aead_key_templates.create_kms_aead_key_template(key_uri=key_uri)
handle = tink.KeysetHandle.generate_new(template)
kms_aead_primitive = handle.primitive(aead.Aead)

# Generate the DEK
key_template = aead.aead_key_templates.AES256_GCM_RAW
keyset_handle = tink.KeysetHandle.generate_new(key_template)
aead_primitive = keyset_handle.primitive(aead.Aead)

# Encrypt data with the DEK
encrypted_value = aead_primitive.encrypt(
    plaintext='encrypt_me'.encode('utf-8'),
    associated_data='test'.encode('utf-8')
)

# Get key by writing it out and capturing that with an io stream
stream = io.BytesIO()
# Encrypt DEK with kms aead primitive
keyset_handle.write(tink.BinaryKeysetWriter(stream), kms_aead_primitive)
stream.seek(0)
key = stream.read()

# Decrypt in BigQuery
bq_client = bigquery.Client(location='europe-west2')

sql = f"""
DECLARE kms_resource_name STRING;
DECLARE first_level_keyset BYTES;
DECLARE associated_data STRING;

SET kms_resource_name = '{key_uri}';
SET first_level_keyset = {key};
SET associated_data = 'test';

SELECT
    AEAD.DECRYPT_STRING(
        KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
        {encrypted_value},
        associated_data
    ) as Decrypted

"""

job_config = bigquery.QueryJobConfig(
    priority=bigquery.QueryPriority.BATCH
)

job = bq_client.query(sql, job_config=job_config)

result = job.result()

When you call
keyset_handle.write(tink.BinaryKeysetWriter(stream), kms_aead_primitive)
then the output is not just the encrypted keyset, it is a container that contains the encrypted keyset with some optional additional metadata about the keys in the keyset.

KEYS.KEYSET_CHAIN probably doesn't use this data format, I think it expects only an encypted keyset without associated data. You can generate this by:

stream = io.BytesIO()
writer = tink.BinaryKeysetWriter(stream)
cleartext_keyset_handle.write(writer, keyset_handle)
serialized_keyset = stream.getvalue()
encrypted_keyset = kms_aead_primitive.encrypt(serialized_keyset, b'')

and then use encrypted_keyset as "first_level_keyset" in KEYS.KEYSET_CHAIN.

Does that work?

I have tested what a described above, and it works.

So KEYS.KEYSET_CHAIN takes a KMS key URI and an encrypted keyset as input, where the encrypted keyset is a serialized tink keyset encrypted by the KMS.

The KeysetHandle's "write" method does not output the same format.

I added some tests in Java here:
eb30909
dbee960