googleapis/google-cloud-php

Deferred keys are malformed in Firestore (Datastore mode)

Opened this issue · 0 comments

Summary:
The lookupBatch($entityKeys) method returns a dictionary with keys "found", "missing", and "deferred".

As per the docs, keys in the deferred array should be retried with lookupBatch($deferredKeys). This fails with the message:
{"field": "keys[21].path[0]","description": "Invalid value at 'keys[21].path[0]' (name), Starting an object on a scalar field"}

It fails because the client (or possibly API) sometimes returns keys that are formatted incorrectly with an additional array.

For $deferredEntityKey->path(), the expected valid output is:
[{"kind":"Relationship","name":"5458349302218752_5748997628362752"}]

Actual output for deferred key is:
[{"kind":[{"kind":"Relationship","name":"5458349302218752_5748997628362752"}],"name":{"projectId":"placeholder"}}]

To make matters stranger, some keys in "deferred" are formatted correctly and some are not. So we must repair keys conditionally, which isn't ideal:

/**
     * LookupBatch's "deferred" has type [Google\Cloud\Datastore\Key], however, the keys use an invalid format.
     * Some keys wrap the actual key in an array, while others are valid.
     * They must be repaired before they can be used as input for lookupBatch().
     */
    private static function repairDeferredKey(EntityKey $inputKey): EntityKey {
        $brokenPath = $inputKey->path(); // Array.
        $firstItem = $brokenPath[0]; // Dictionary.
        $topKind = $firstItem["kind"]; // Array.

        // Either a dictionary (invalid key) or a string (valid key).
        // 99% of deferred keys are invalid, but occasionally they are valid (strange).
        $actualKeyValues = $topKind[0];

        if (is_string($actualKeyValues)) {

            // No need to change the key, it's in the expected valid format.
            // This is rare, but does occur.
            return $inputKey;

        } else if (is_array($actualKeyValues)) {

            $kind = $actualKeyValues["kind"];
            $name = $actualKeyValues["name"] ?? null;
            $id = $actualKeyValues["id"] ?? null;
            $nameOrId = $name ?? $id;

            // Construct new, valid key.
            $validKey = self::get()->key($kind, $nameOrId);
            return $validKey;

        } else {
            throw new Exception("Failed to repair deferred key of type " . gettype($actualKeyValues));
        }
    }

Google Docs say the max number of keys that can be looked up at once is 1,000 but the observable limit appears to be closer to 300 keys, and any others are returned as deferred. What is the actual limit and can we please document it somewhere?

We can reproduce this consistently for a specific set of keys, but whether a key is deferred or not appears to be an internal Google controlled process, so reproducing in a new project may be challenging. It does not occur in the datastore emulator.

Environment details

  • OS: App Engine, Standard
  • PHP version: 8.2
  • Composer: "google/cloud-core": "^1.52", "google/cloud-datastore": "^1.25",
  • Actual versions used: Google Core v1.60.0, Datastore v1.32.1

Steps to reproduce

  1. Perform a batch lookup with a result that has "deferred" keys.
  2. Send the deferred keys back to lookupBatch
  3. Observe it sometimes fails because deferred keys are malformed

Code example

Additional context for how we batch lookup entities, retrying deferred keys:

public static function safeLookupBatch(array $keys): array {
        $ds = Datastore::get();
        $maxKeysPerBatch = self::MaxLookupKeys; // Maximum keys processed in a single batch.
        $allResults = []; // Final results containing all found entities.
        $remainingKeysToFetch = $keys; // Keys that are still to be looked up.
        $maxDeferredRetries = 3; // Maximum number of retries allowed for deferred keys.
        $deferredRetryCount = 0; // Tracks how many retries have been attempted due to deferred keys.
        
        while (!empty($remainingKeysToFetch)) {
            $currentBatchResults = []; // Stores results from the current batch of lookups.
            $deferredKeys = []; // Stores any deferred keys returned from the datastore.
    
            // Increment retry count at the start of each deferred retry.
            $deferredRetryCount++;
            if ($deferredRetryCount > $maxDeferredRetries) {
                throw new \Exception(
                    sprintf(
                        "Maximum deferred key retries exceeded: %d retries made, %d keys remaining.",
                        $deferredRetryCount,
                        count($remainingKeysToFetch)
                    )
                );
            }
    
            // Batch process $remainingKeysToFetch in chunks of $maxKeysPerBatch (includes any deferred keys).
            foreach (array_chunk($remainingKeysToFetch, $maxKeysPerBatch) as $batchKeys) {
                // Perform lookup.
                $batchResults = $ds->lookupBatch($batchKeys);
    
                // Collect found entities.
                $foundEntities = $batchResults["found"] ?? [];
                $currentBatchResults = array_merge($currentBatchResults, $foundEntities);
    
                // Collect and process deferred keys, if any.
                $deferredEntities = $batchResults["deferred"] ?? null;
                if (!empty($deferredEntities)) {
                    $deferredEntities = self::repairDeferredKeyArray($deferredEntities);
                    $deferredKeys = array_merge($deferredKeys, $deferredEntities);
                }
            }
    
            // Merge the current batch results into the overall results.
            $allResults = array_merge($allResults, $currentBatchResults);
    
            // Update remaining keys for the next iteration based on deferred keys.
            $remainingKeysToFetch = $deferredKeys;
        }
    
        return $allResults;
    }

Thank you, I love this project