paragonie/ciphersweet

Recommendations for Zero-downtime key rotation?

stevenmaguire opened this issue · 6 comments

Thanks for this great package! I am looking for a little direction, advice, and/or recommendations. I have a database table with >300k rows of data where a couple fields need to be encrypted with blind indexes. The encryption process with blind indexes enabled takes double digit hours (10+) to complete while the encryption process without blind indexes takes <20 minutes.

I understand there is a design pattern for rotating the keys and the package is designed to support a single key ("the key").

With an encryption/key rotation process that takes 10+ hours, it is not feasible to go "down for maintenance" that long and if the rotation process takes place while "up" then it stands to reason that there would be rows that are encrypted with different keys and any one time.

I am endeavoring to find a solution wherein I can have a list of keys available (the newest one, and the previous most recent one) and add some tolerance to the decryption step for attempting all of those keys until one hits. This would, in theory, allow me to gracefully handle the use case I described above without throwing workflow disruption exceptions.

The idea being, we can decrypt with more than one key but upon encryption the newest one would be used.

I am not sure if this use case a) was intentionally avoided (for reasons...?), b) is already supported (and not documented well?), or c) is reasonable (and requires some refactoring?).

If it is "a" or "b", can you provide some context and help?

If it is "c", I am happy to help contribute some ideas/code to help get to that capability - I am just not sure which high-level approach would be more appropriate/preferred to this group.

At the moment, the ParagonIE\CipherSweet\EncryptedRow::decryptRow method has my attention.

public function decryptRow(
#[\SensitiveParameter]
array $row
): array {
/** @var array<string, string|int|float|bool|null|scalar[]> $return */
$return = $row;
$backend = $this->engine->getBackend();
if ($this->engine->isMultiTenantSupported()) {
$tenant = $this->engine->getTenantFromRow($row, $this->tableName);
$this->engine->setActiveTenant($tenant);
}
foreach ($this->fieldsToEncrypt as $field => $type) {
$key = $this->engine->getFieldSymmetricKey(
$this->tableName,
$field
);
if (!array_key_exists($field, $row)) {
if (!$this->permitEmpty) {
throw new EmptyFieldException('Field is not defined in row: ' . $field);
}
continue;
}
if (\is_null($row[$field])) {
$return[$field] = null;
continue;
}
if (
!empty($this->aadSourceField[$field])
&&
\array_key_exists($this->aadSourceField[$field], $row)
) {
$aad = (string) $row[$this->aadSourceField[$field]];
} else {
$aad = '';
}
if ($type === Constants::TYPE_JSON && !empty($this->jsonMaps[$field])) {
// JSON is a special case
$jsonEncryptor = new EncryptedJsonField(
$backend,
$key,
$this->jsonMaps[$field],
$this->jsonStrict[$field]
);
$return[$field] = $jsonEncryptor->decryptJson($row[$field], $aad);
continue;
}
$plaintext = $backend->decrypt($row[$field], $key, $aad);
$return[$field] = $this->convertFromString($plaintext, $type);
}
return $return;
}

What do you think?

The encryption process with blind indexes enabled takes double digit hours (10+) to complete while the encryption process without blind indexes takes <20 minutes.

You probably want fast blind indexes. https://github.com/paragonie/ciphersweet/blob/master/src/BlindIndex.php#L39

The default uses a password-hashing algorithm (PBKDF2 or Argon2id). Fast hashes use HMAC or BLAKE2b.

This should reduce it to under 1 hour, moving forward.

I understand there is a design pattern for rotating the keys and the package is designed to support a single key ("the key").

Recently, we introduced the concept of a Multi-Tenant Key Provider. We've developed a few implementations for our clients' specific needs, but haven't published a general-purpose open source one yet. Multi-tenancy is something a bit particular to support.

That said, you could easily leverage this multi-tenancy support for key rotation.

Thanks for the prompt response.

You probably want fast blind indexes.

Thanks. I'll take a look at that option and establish new baselines.

That said, you could easily leverage this multi-tenancy support for key rotation.

I thought you might mention that. I saw that support hinted in the code but didn't see too many examples or much direction. I'll explore that avenue as well.


I'll close this discussion for now. If it's alright, I'll re-open with additional findings or questions.

Happy to help. Please fell free to file new issues or reopen as you see fit.

Thanks for the previous direction on this topic. I've made some headway in achieving the desired behavior. I did notice two things that I want to follow-up with. Again, I'm happy to split this out and take your lead on discussion triage as needed, if you have preferences. In the meantime, I want to put this out there here:

  1. Setting the the Engine's Active Tenant on the decrypt operation AND not also using getTenantFromRow to locate the possibly updated/newly desired tenant for a given row before attempting the encrypt operation feels like a code smell and produces some misleading side effects.
  2. Symmetrical support for multiple, row based Backends (just as KeyProviders are supported) feels essential to ensure portability and accessibility of a safe encryption lifecycle.

Before getting into the details, I would like to point out that I have been familiarizing myself with this code and the supporting test cases. I am happy to help contribute towards this effort if it makes sense directionally. That work feels involved. There is a broad design direction here that I want to understand before jumping into the work. I am not trying to armchair quarterback here...


  1. Setting the the Engine's Active Tenant on the decrypt operation AND not using getTenantFromRow to locate the possibly updated/newly desired tenant for a given row before attempting the encrypt operation feels like a code smell and produces some misleading side effects.

The behavior I am reporting here was discovered when attempting to write a utility to rotate multi-tenant keys. Because engine in this Encrypted Row class is acting like a singleton, updating it within the package upon read and not exposing a way to change the key preference before write feels like a hole.

Here's a line where this happens.

During decrypt attempt and BEFORE the decrypt logic executes, we detect the appropriate tenant from the given row and update the row's engine to set its active tenant configuration to match the detected tenant.

if ($this->engine->isMultiTenantSupported()) {
$tenant = $this->engine->getTenantFromRow($row, $this->tableName);
$this->engine->setActiveTenant($tenant);
}

Then some application lifecycle activity happens, outside of the scope of this library. One could expect that one of the side effects of this logic is to update/change/rotate the desired tenant of the row.

Then during encrypt attempt, the tenant is selected/used based on the engine's active tenant (set during decrypt), regardless of the given row's preference (which may have changed). So the row get's encrypted with the same active tenant that was originally pulled from the DB, and any logic within injectTenantMetadata to attempt to update the row's metadata with the appropriate/desired tenant ID is useless because the encryption is already done.

public function encryptRow(
#[\SensitiveParameter]
array $row
): array {
/** @var array<string, string|int|float|bool|null|scalar[]> $return */
$return = $row;
$backend = $this->engine->getBackend();
foreach ($this->fieldsToEncrypt as $field => $type) {
if (!\array_key_exists($field, $row)) {
throw new ArrayKeyException(
'Expected value for column ' .
$field .
' on array, nothing given.'
);
}
$key = $this->engine->getFieldSymmetricKey(
$this->tableName,
$field
);
if (
!empty($this->aadSourceField[$field])
&&
\array_key_exists($this->aadSourceField[$field], $row)
) {
$aad = $this->coaxAadToString($row[$this->aadSourceField[$field]]);
} else {
$aad = '';
}
if ($type === Constants::TYPE_JSON && !empty($this->jsonMaps[$field])) {
// JSON is a special case
$jsonEncryptor = new EncryptedJsonField(
$backend,
$key,
$this->jsonMaps[$field],
$this->jsonStrict[$field]
);
$return[$field] = $jsonEncryptor->encryptJson($this->coaxToArray($row[$field]), $aad);
continue;
}
if (!is_scalar($row[$field])) {
throw new TypeError('Invalid type for ' . $field);
}
$plaintext = $this->convertToString($row[$field], $type);
$return[$field] = $backend->encrypt($plaintext, $key, $aad);
}
/** @var array<string, string> $return */
if ($this->engine->isMultiTenantSupported()) {
return $this->engine->injectTenantMetadata($return, $this->tableName);
}
return $return;
}

Without any changes, the only way to affect this change in desired tenant key is to manually update the EncryptedRow object's engine configuration in between decrypt and encrypt:

$row->getEngine()->setActiveTenant('new-tenant-key-identifier');

The intent of this has also likely already happened during app initialization - "I want the my engine to use X key as a default."

If the idea is that the getTenantFromRow method, which is expected to be implemented by the downstream consumer, is intended to manage how a given row is encrypted, then it feels like the getTenantFromRow method could be valuable, and should be used in the encrypt method to direct the encrypt logic used on a per row basis. Something like:

CleanShot 2023-04-11 at 11 47 49@2x

Even in the test for this behavior, the test is working hard to manage the active tenant using a similar approach that I referenced above:

CleanShot 2023-04-11 at 11 49 26@2x

It feels like the test should be something along the lines of fetching the decrypted row, updating the downstream-consumer-defined-tenant-id-column to be a valid tenant id, then encrypt the row and it succeed.

If the row is brand new and has never been encrypted before, therefore not having been through a decrypt operation, it stands to reason the expectation would remain that the current "default" active tenant would be used. Since this is bleeding edge support, I think it is reasonable to expect downstream consumers to implement that check/logic in their getTenantFromRow implementation.


  1. Symmetrical support for multiple, row based Backends (just as KeyProviders as supported) feels essential to ensure portability and accessibility of safe, encryption.

Considering each encrypt/decrypt lifecycle is dependent on the combination of the KeyProvider and Backend, each row feels like it should also be able to direct it's preferred backend just like it can direct its key provider.

I acknowledge that the backend is likely to be changed/updated far less frequently than a key for a given multi-tenant use case, but it still feels like a requirement from a security/vulnerability response perspective. If a zero-day vulnerability is detected for a given backend, it would be very difficult to locate and update any affected rows without some symmetrical support that is provided for the key providers.

I'm sure this is a lot to process and ingest. However, I would like to solicit some feedback here. What are your thoughts on this?

Is there any response you can offer here?